BUDT704 Team Project Report

Group 13 (0502) - Team Story Sharks

From Pitch to Payout: A Deep Dive into All Shark Tank Pitches </b>¶

Source : https://cdn1.edgedatg.com/

INDEX FOR THE REPORT

  1. Project Introduction / Overview
  2. Choice of heavier grading criteria : Data Analysis
  3. Research Questions
  4. About the Datasets
  5. Data Collection
  6. Exploratory Data Analysis
  7. Data Preprocessing Overview / Summary
  8. Data Preprocessing
  9. Data Analysis Overview / Summary
  10. Research Questions 1 to 5
  11. Conclusion - Summary of our findings
  12. References and Citations
  13. Figures and Tables
  14. Team Members
  15. Link to the PPT

1. PROJECT INTRODUCTION¶

Shark Tank is a renowned TV show where entrepreneurs pitch their ideas to wealthy investors, known as sharks, aiming to secure investment for their startups. Our analysis focuses on understanding the dynamics of successful pitches on the show by utilizing data from two key sources: the "Shark Tank US Dataset," covering seasons 1 to 14, and the "All Shark Tank Pitches Dataset," spanning eight seasons. These datasets provide valuable insights into the decision-making of investors and pitch strategies employed by the entrepreneurs, allowing us to uncover patterns and trends.

Our investigation explores questions related to the temporal patterns of successful pitches, the co-investment preferences among sharks, and the industries attracting the most capital, factors influencing the shark decisions and the impact of presence of guests on the show. The goal is to look at how the show has progressed over the years and to unveil the strategies behind successful pitches, understand what the preferences of the Shark's are, and explore the impact of various factors on the success of a pitch.

2. CHOICE OF HEAVIER GRADING CRITERIA : Data Analysis¶

The project really stands out in how we handle the data, making sure every transformation is spot on. This gives us a solid base for digging into various trends. As we shift gears into the analysis, we get into the nitty-gritty of investment patterns, using techniques like TF-IDF and sentiment analysis.

We zoom in on the lifestyle/home sector, going beyond the usual summaries. This gives us a real inside look into what's going on in the industry. We're not just crunching numbers; we're also looking at the impact of guests, adding a human touch that's often missed in these kinds of analyses.

Our approach is hands-on, using techniques like statistical analysis, visualization, and text analysis. This mix justifies putting a big focus on the data analysis part, showing off how deep we're diving.

To sum it up, our project isn't just about numbers and trends; it's about real insights. We're peeling back the layers of data, like when we delved into the Doorbot example, giving us a solid understanding for making smart decisions and pointing to interesting avenues for future research.

3. RESEARCH QUESTIONS¶

Question 1

Can we identify temporal patterns in pitch success on the show and how do they evolve over the seasons, including viewership trends?

Understanding the temporal dynamics of pitch success is crucial for entrepreneurs seeking optimal timing for their Shark Tank appearance. Analyzing seasonal and monthly trends provides insights into the factors influencing success rates and allows entrepreneurs to strategize effectively.

Question 2

Are there statistically significant co-investment patterns among the sharks, revealing insights into their investment strategies and price discrimination?

Exploring co-investment patterns among sharks’ sheds light on collaborative approaches and potential price discrimination. Entrepreneurs can benefit from understanding these dynamics, tailoring pitches to align with preferred investment strategies. This will also be useful to the entrepreneurs in targeting multiple sharks based on their co-investment strategies and portfolios.

Question 3

Can we analyze trends in pitchers on the show to identify sectors with the highest capital raised and assess how investment trends impact the business pitches?

Examining the patterns in capital raised across various sectors offers valuable insights for entrepreneurs seeking to tailor their pitches to current investment trends. Grasping the influence of these trends on business proposals, along with historical data on sector-specific investments by Sharks, proves beneficial for strategic planning.

Question 4

What factors in Business pitches influence the equity demands of sharks on the show and to what extent do these descriptions impact the likelihood of securing a deal?

It is essential for entrepreneurs to carefully consider the various factors that impact equity demands during negotiations. By delving into aspects such as valuation, revenue projections, and refining negotiation skills, entrepreneurs can adopt a more strategic and informed approach to navigate equity discussions. This comprehensive understanding empowers them to make informed decisions, enhancing their ability to achieve favourable outcomes in negotiations and fostering the growth and success of their ventures.

Question 5

Does the presence of specific investors or guests on the show influence entrepreneurs' deal success and viewership, and who has the most significant impact on both?

Analyzing the influence of specific investors or guests is crucial to entrepreneurs for tailoring pitches. This information could be valuable for the producers of Shark Tank too, as it indicates which guests have a more substantial and consistent influence on the show's success in terms of attracting and retaining viewers.

4. ABOUT THE DATASETS¶

We will be making use of two datasets. The first dataset consolidates data from seasons 1 to 14 of the American business reality series, Shark Tank. It consists of 1274 rows and 50 columns with each row representing a different pitch made on Shark Tank. These fields present diverse details regarding the pitch, entrepreneurs, and deals formulated within the episodes.

  • Key Details:
  1. Season & Episode Information: Details about seasons, episode numbers, air dates, etc.
  2. Entrepreneur & Startup Details: Various attributes such as startup name, industry, entrepreneur demographics, and business descriptions.
  3. Viewership Metrics: Viewer metrics like US viewership.
  4. Investment Asks & Deals: Data regarding initial asks, deals secured, investment amounts, and valuation figures.
  5. Shark Investment Specifics: Information about individual sharks’ investments, equity stakes, and presence in episodes.
  6. Deal Dynamics: Includes deal details like the number of sharks in a deal, individual shark equity, and investments.

The second dataset provides a comprehensive record of pitches made on Shark Tank. It contains 5 columns and 706 rows with each row representing a pitch. This dataset spans across eight seasons, offering valuable insights into the factors that influence the success of these pitches and the subsequent investment decisions by the sharks. This dataset contains the following headers:

  • Dataset Details:
  1. Season_Epi_code: This field indexes the data across all 8 seasons of Shark Tank (US) using a code in the format of SEE (101 = 1st season 1st Episode, 826 = 8th season 26th Episode). It helps identify the season and episode for referencing and analysis.
  2. Pitched_Business_Identifier: This column provides a short name or identifier for the businesses pitched on the show.
  3. Pitched_Business_Desc: A brief description of the pitched business is included in this column. It combines text from various sources, and while there may be some repetition or short descriptions, it serves as a textual overview of each business.
  4. Deal_Status: This field indicates the status of the pitched business, specifically whether the pitch resulted in a deal in the episode. It is formatted as YES (1) for deals that were agreed upon by at least one shark and the presenters, and NO (0) for pitches that did not secure a deal.
  5. Deal_Shark: In this column, you will find information about which of the most common sharks participated in the episode and agreed on a deal with the presenters. The format includes either a single shark's initials or a list of multiple shark initials separated by '+'.

This dataset complements our previous dataset, further enhancing our ability to analyze and understand the dynamics of pitches on "Shark Tank." With this additional data, we can explore how specific sharks' interests and the business descriptions influence investment decisions, helping us build a more comprehensive picture of the show's outcomes.

Dataset 1 - https://www.kaggle.com/datasets/thirumani/shark-tank-us-dataset/data
Dataset 2 - https://www.kaggle.com/datasets/neiljs/all-shark-tank-us-pitches-deals

Both datasets are available on Kaggle and are linked above. We will download the datasets from Kaggle and import them into our Jupyter Notebook environment for analysis.

5. DATA COLLECTION¶

5.1 Import Required Libraries¶

Environment Setup:

  • In order to run the notebook in Google Colab or Jupyter Notebook, please ensure that you have installed the following modules:
    1. nbformat
    2. textstat
    3. wordcloud
    4. imageio.v2
    5. plotly
    6. nltk
      To install, excute command like 'pip install textstat' in the anacoda terminal.
  • To remove the warning of 'CryptographyDeprecationWarning: Blowfish has been deprecated', upgrade paramiko to latest release (2.12.0)
    To fix the issue, excute 'conda update paramiko' in the anacoda terminal.
In [1]:
import pandas as pd
import numpy as np
import re
from datetime import datetime
import matplotlib.pyplot as plt
import imageio #reading and writing image data
import seaborn as sns
import itertools
from PIL import Image #open and save images

import plotly.graph_objects as go #make interactive graphs
import plotly.express as px
from plotly.subplots import make_subplots #display multiple plots in one figure
import plotly.figure_factory as ff
#To corretly show the plotly graph in html
import plotly.io as pio
#pio.renderers.default = 'notebook'

import nltk #used to analyze data written by humans
from nltk.stem import WordNetLemmatizer #make sensible words out of uncleaned data
from nltk.corpus import stopwords #filter out common words

from sklearn.feature_extraction.text import TfidfVectorizer #convert raw documents to a TF-IDF matrix
from wordcloud import WordCloud #generate a word cloud
from textblob import TextBlob #processing textual data
import textstat #calculating textual statistics including readability scores

nltk.download('stopwords') #one time installation
nltk.download('wordnet') #one time installation

#import libraries needed for getting images from web
from PIL import Image
import requests
from io import BytesIO
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/siddharthkulkarni/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/siddharthkulkarni/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
In [2]:
print("Notebook last executed on:", datetime.now().strftime("%m/%d/%Y, %H:%M:%S"))
Notebook last executed on: 12/06/2023, 19:52:00

5.2 Importing the first dataset¶

In [3]:
# import data from csv file present in GitHub repository using pandas
path = 'https://raw.githubusercontent.com/JoyceGaoH/project-shark/main/Shark%20Tank%20US%20dataset.csv' # save github repository url
df_shark_tank_1 = pd.read_csv(path)
df_shark_tank_1.head()
Out[3]:
Season Number Season Start Season End Episode Number Pitch Number Original Air Date Startup Name Industry Business Description Pitchers Gender ... Kevin O Leary Investment Equity Guest Investment Amount Guest Investment Equity Guest Name Barbara Corcoran Present Mark Cuban Present Lori Greiner Present Robert Herjavec Present Daymond John Present Kevin O Leary Present
0 1 9-Aug-09 5-Feb-10 1 1 9-Aug-09 AvaTheElephant Health/Wellness Ava The Elephant - Baby and Child Care Female ... NaN NaN NaN NaN 1.0 0.0 0.0 1.0 1.0 1.0
1 1 9-Aug-09 5-Feb-10 1 2 9-Aug-09 Mr.Tod'sPieFactory Food and Beverage Mr. Tod's Pie Factory - Specialty Food Male ... NaN NaN NaN NaN 1.0 0.0 0.0 1.0 1.0 1.0
2 1 9-Aug-09 5-Feb-10 1 3 9-Aug-09 Wispots Business Services Wispots - Consumer Services Male ... NaN NaN NaN NaN 1.0 0.0 0.0 1.0 1.0 1.0
3 1 9-Aug-09 5-Feb-10 1 4 9-Aug-09 CollegeFoxesPackingBoxes Lifestyle/Home College Foxes Packing Boxes - Consumer Services Male ... NaN NaN NaN NaN 1.0 0.0 0.0 1.0 1.0 1.0
4 1 9-Aug-09 5-Feb-10 1 5 9-Aug-09 IonicEar Software/Tech Ionic Ear - Novelties Male ... NaN NaN NaN NaN 1.0 0.0 0.0 1.0 1.0 1.0

5 rows × 50 columns

5.2 import the second dataset:¶

In [4]:
path2 = 'https://raw.githubusercontent.com/JoyceGaoH/project-shark/main/Sharktankpitchesdeals.csv'
df_shark_tank_2 = pd.read_csv(path2)
df_shark_tank_2.head()
Out[4]:
Season_Epi_code Pitched_Business_Identifier Pitched_Business_Desc Deal_Status Deal_Shark
0 826 Bridal Buddy a functional slip worn under a wedding gown th... 1 KOL+LG
1 826 Laid Brand hair-care products made with pheromones . Laid... 0 NaN
2 826 Rocketbook a notebook that can scan contents to cloud ser... 0 NaN
3 826 Wine & Design painting classes with wine served . Wine & Des... 1 KOL
4 824 Peoples Design a mixing bowl with a built-in scoop . Peoples ... 1 LG

6. EXPLORATORY DATA ANALYSIS ¶

In [5]:
df_shark_tank_1.shape
Out[5]:
(1274, 50)
In [6]:
df_shark_tank_2.shape
Out[6]:
(706, 5)

The second datasets seems to have lesser number of observations as comapred to the first one. We observed that the 2nd dataset has records for 8 seasons, whereas, the 1st datasets has records for 14.

In [7]:
df_shark_tank_1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1274 entries, 0 to 1273
Data columns (total 50 columns):
 #   Column                              Non-Null Count  Dtype  
---  ------                              --------------  -----  
 0   Season Number                       1274 non-null   int64  
 1   Season Start                        1274 non-null   object 
 2   Season End                          1274 non-null   object 
 3   Episode Number                      1274 non-null   int64  
 4   Pitch Number                        1274 non-null   int64  
 5   Original Air Date                   1274 non-null   object 
 6   Startup Name                        1274 non-null   object 
 7   Industry                            1274 non-null   object 
 8   Business Description                1274 non-null   object 
 9   Pitchers Gender                     1267 non-null   object 
 10  Pitchers City                       502 non-null    object 
 11  Pitchers State                      746 non-null    object 
 12  Pitchers Average Age                338 non-null    object 
 13  Entrepreneur Names                  779 non-null    object 
 14  Company Website                     516 non-null    object 
 15  Multiple Entrepreneurs              847 non-null    float64
 16  US Viewership                       1274 non-null   float64
 17  Original Ask Amount                 1274 non-null   int64  
 18  Original Offered Equity             1274 non-null   float64
 19  Valuation Requested                 1274 non-null   int64  
 20  Got Deal                            1274 non-null   int64  
 21  Total Deal Amount                   765 non-null    float64
 22  Total Deal Equity                   765 non-null    float64
 23  Deal Valuation                      765 non-null    float64
 24  Number of sharks in deal            765 non-null    float64
 25  Investment Amount Per Shark         765 non-null    float64
 26  Equity Per Shark                    765 non-null    float64
 27  Royalty Deal                        75 non-null     float64
 28  Loan                                52 non-null     float64
 29  Barbara Corcoran Investment Amount  120 non-null    float64
 30  Barbara Corcoran Investment Equity  120 non-null    float64
 31  Mark Cuban Investment Amount        230 non-null    float64
 32  Mark Cuban Investment Equity        230 non-null    float64
 33  Lori Greiner Investment Amount      199 non-null    float64
 34  Lori Greiner Investment Equity      199 non-null    float64
 35  Robert Herjavec Investment Amount   121 non-null    float64
 36  Robert Herjavec Investment Equity   121 non-null    float64
 37  Daymond John Investment Amount      111 non-null    float64
 38  Daymond John Investment Equity      111 non-null    float64
 39  Kevin O Leary Investment Amount     117 non-null    float64
 40  Kevin O Leary Investment Equity     117 non-null    float64
 41  Guest Investment Amount             105 non-null    float64
 42  Guest Investment Equity             105 non-null    float64
 43  Guest Name                          105 non-null    object 
 44  Barbara Corcoran Present            898 non-null    float64
 45  Mark Cuban Present                  901 non-null    float64
 46  Lori Greiner Present                901 non-null    float64
 47  Robert Herjavec Present             897 non-null    float64
 48  Daymond John Present                898 non-null    float64
 49  Kevin O Leary Present               898 non-null    float64
dtypes: float64(31), int64(6), object(13)
memory usage: 497.8+ KB

A deep-dive into the dataset's columns:

  1. Journey across Seasons

The journey begins by navigating through different seasons, with each season having their own sets of challenges, successes and new learnings. We discover how the atmosphere of entrepreneurial initiatives evolves over time.

  1. Enterpreneurial Profiles

The world of entrepreneurship is full of diverse personalities, each with an unique history. 'Pitchers Gender,' 'Pitchers City,' and 'Pitchers State' depict a clear image of the entrepreneurs, highlighting the gender and geographic diversity of people looking to create an impact.

  1. Investment Strategies

Pitches that are successful emphasize strategic financial moves and the skillful negotiations to get investments by the enterpreneurs.

  1. Dynamics of the Sharks

Shark engagement, visible by their presence, highlights the strategic partnerships that investors and entrepreneurs build, which have an impact on business prospects.

  1. Financial Achievements

The results of the show gives insights about the business profiles as well as the difficulties entrepreneurs face when seeking investment, providing information about the difficulties faced by businesses in the real world.

In [8]:
df_shark_tank_2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 706 entries, 0 to 705
Data columns (total 5 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   Season_Epi_code              706 non-null    int64 
 1   Pitched_Business_Identifier  706 non-null    object
 2   Pitched_Business_Desc        706 non-null    object
 3   Deal_Status                  706 non-null    int64 
 4   Deal_Shark                   383 non-null    object
dtypes: int64(2), object(3)
memory usage: 27.7+ KB

A brief look into the 2nd dataset's columns:

  • Through their unique personalities, entrepreneurs share a variety of innovative and ambitious stories in the lively episodes of Shark Tank. A fascinating story of entrepreneurial determination is created as deals are secured and strategic partnerships are formed.
In [9]:
df_shark_tank_1.describe().applymap('{:,.2f}'.format)
Out[9]:
Season Number Episode Number Pitch Number Multiple Entrepreneurs US Viewership Original Ask Amount Original Offered Equity Valuation Requested Got Deal Total Deal Amount ... Kevin O Leary Investment Amount Kevin O Leary Investment Equity Guest Investment Amount Guest Investment Equity Barbara Corcoran Present Mark Cuban Present Lori Greiner Present Robert Herjavec Present Daymond John Present Kevin O Leary Present
count 1,274.00 1,274.00 1,274.00 847.00 1,274.00 1,274.00 1,274.00 1,274.00 1,274.00 765.00 ... 117.00 117.00 105.00 105.00 898.00 901.00 901.00 897.00 898.00 898.00
mean 7.92 12.52 637.50 0.44 5.14 284,137.36 13.80 3,550,595.48 0.60 296,062.96 ... 240,747.86 15.11 212,293.65 15.59 0.56 0.90 0.75 0.88 0.66 0.96
std 3.72 7.47 367.92 0.50 1.48 359,005.10 8.64 5,878,462.11 0.49 358,828.25 ... 300,652.14 11.23 211,753.55 13.35 0.50 0.30 0.43 0.33 0.47 0.21
min 1.00 1.00 1.00 0.00 2.27 10,000.00 1.00 40,000.00 0.00 10,000.00 ... 20,000.00 0.00 20,000.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
25% 5.00 6.00 319.25 0.00 3.85 100,000.00 10.00 666,667.00 0.00 100,000.00 ... 83,333.33 6.00 75,000.00 8.75 0.00 1.00 1.00 1.00 0.00 1.00
50% 8.00 12.00 637.50 0.00 4.88 200,000.00 10.00 1,500,000.00 1.00 200,000.00 ... 150,000.00 10.00 125,000.00 11.00 1.00 1.00 1.00 1.00 1.00 1.00
75% 11.00 19.00 955.75 1.00 6.39 350,000.00 20.00 4,000,000.00 1.00 350,000.00 ... 270,000.00 20.00 250,000.00 20.00 1.00 1.00 1.00 1.00 1.00 1.00
max 14.00 29.00 1,274.00 1.00 8.64 5,000,000.00 100.00 100,000,000.00 1.00 5,000,000.00 ... 2,500,000.00 50.00 1,250,000.00 100.00 1.00 1.00 1.00 1.00 1.00 1.00

8 rows × 37 columns

This indicates high expectations or confidence in their businesses, as they are likely in early stages of development. The relatively high valuation requests might also reflect the entrepreneurs' understanding of negotiation dynamics on the show, where sharks often counter-offer with lower valuations. The fact that the average deal amount is higher than the median ask could imply that sharks are willing to invest more in ventures they see as highly promising. It might also suggest that entrepreneurs who ask for reasonable or slightly lower amounts are more likely to get a deal, possibly with better terms.

The average viewership of about 5.14 million U.S. viewers per episode highlights the show's popularity and wide appeal. This can be attributed to the educational and entertainment value it provides, offering insights into entrepreneurship, investment negotiations, and business strategies. High viewership also means greater exposure for the businesses that pitch, which can be valuable in itself, independent of whether they secure a deal.

In [10]:
df_shark_tank_2.describe().applymap('{:,.2f}'.format)
Out[10]:
Season_Epi_code Deal_Status
count 706.00 706.00
mean 519.65 0.54
std 213.60 0.50
min 101.00 0.00
25% 405.00 0.00
50% 523.00 1.00
75% 709.00 1.00
max 826.00 1.00

Upon having a look at the .info() and .describe() function outputs for majority of the columns present in both datasets, which cover a lot of statistical information, we infer that these results are not that useful and not much can be derived out of them.
Therefore, a deeper analysis is required with the help of visualizations.

In [11]:
# Frequency Counts for Categorical Variables
df_shark_tank_1['Industry'].value_counts()
Out[11]:
Food and Beverage          276
Lifestyle/Home             228
Fashion/Beauty             217
Children/Education         118
Fitness/Sports/Outdoors    113
Health/Wellness             65
Software/Tech               65
Pet Products                51
Business Services           37
Media/Entertainment         24
Uncertain/Other             18
Automotive                  17
Electronics                 15
Green/CleanTech             11
Travel                      11
Liquor/Alcohol               8
Name: Industry, dtype: int64

Business proposals from Shark Tank are categorized into several categories, exposing patterns in investor interest and entrepreneurial focus. Food and Beverage is the most popular category, with 276 pitches, suggesting a high likeliness towards entrepreneurship in this industry, perhaps because of its relevance to consumers and wide market appeal. Lifestyle/Home and Fashion/Beauty trail closely after, indicating a notable inclination towards consumer products and services that improve everyday life and individual appearance.

A noteworthy presence of categories such as Children/Education and Fitness/Sports/Outdoors is indicative of the cultural emphasis on wellness, health, and education. Software/Tech and Health/Wellness are interestingly intertwined, which may suggest a balanced interest in both health-related and technical progress. The lower volume of pitches in specialized categories such as Travel, Liquor/Alcohol, and Green/CleanTech may be due to a variety of factors such as market size, perceptions of risk and investor expertise.

In [12]:
# Display the number of pitchers by gender and teams
gender_teams = df_shark_tank_1['Pitchers Gender'].value_counts()
print(gender_teams)
# create a figure and set different background
Male          703
Female        330
Mixed Team    234
Name: Pitchers Gender, dtype: int64

The Shark Tank show's gender distribution data shows distinct patterns in the representation of entrepreneurs: there are 234 pitches from mixed-gender teams and 703 male pitchers, a substantial majority over the 330 female pitchers. This draws attention to the gender gap in the entrepreneurial field, pointing to a higher participation rate among men and possible obstacles for female entrepreneurs using these platforms. Mixed teams demonstrate cross-gender cooperation. The gender dynamics and biases that are prevalent in the investment and entrepreneurship fields are highlighted by this data.

7. DATA PREPROCESSING SUMMARY / OVERVIEW¶

  1. Null Value Analysis:

    The isnull().sum() method is used to identify the total number of null values in each column of the datasets df_shark_tank_1 and df_shark_tank_2.

  2. Data Type Conversion: Several columns in df_shark_tank_1 are converted to appropriate data types:

    • Investment-related columns to float.

    • Date columns to datetime.

    • Season, episode, and pitch numbers to integers.

    • Startup name, industry, and business description to string.

    • Multiple Entrepreuners column is converted to integer, potentially indicating a binary or categorical nature.

  3. Handling Missing Data:

    • Rows with null values in 'Pitchers Gender' are dropped.

    • Certain columns are dropped (e.g., 'Royalty Deal', 'Loan') due to irrelevance to the analysis.

    • Columns expected to contain textual information (e.g., 'Pitchers City', 'Entrepreneur Names') are filled with 'Unknown' when null.

    • Numeric columns are filled with 0.0 when null, indicating either a lack of investment or non-applicability of the metric.

  4. Dataset Indexing, Modification and Merging:

    • The set_index() method is used to set 'Startup Name' as the index for df_st1 (modified df_shark_tank_1).

    • A subset of df_shark_tank_2 is created focusing on business identifiers and descriptions.

    • Company names are standardized by converting to lowercase and removing whitespace to facilitate merging.

    • A left merge is performed between df_st1 and the modified df_shark_tank_2 (df_shark_tank_3), ensuring the preservation of df_shark_tank_1's data.

    • Business descriptions from both datasets are combined, and the index is set to the standardized company name.

Thorough data processing and cleaning phase sets a strong foundation for subsequent data analysis, ensuring that the analysis is conducted on reliable and well-structured data.

8. DATA PREPROCESSING¶

8.1 Data Cleaning¶

In [13]:
df_shark_tank_1.isnull().sum() # show the total number of null values per column
Out[13]:
Season Number                            0
Season Start                             0
Season End                               0
Episode Number                           0
Pitch Number                             0
Original Air Date                        0
Startup Name                             0
Industry                                 0
Business Description                     0
Pitchers Gender                          7
Pitchers City                          772
Pitchers State                         528
Pitchers Average Age                   936
Entrepreneur Names                     495
Company Website                        758
Multiple Entrepreneurs                 427
US Viewership                            0
Original Ask Amount                      0
Original Offered Equity                  0
Valuation Requested                      0
Got Deal                                 0
Total Deal Amount                      509
Total Deal Equity                      509
Deal Valuation                         509
Number of sharks in deal               509
Investment Amount Per Shark            509
Equity Per Shark                       509
Royalty Deal                          1199
Loan                                  1222
Barbara Corcoran Investment Amount    1154
Barbara Corcoran Investment Equity    1154
Mark Cuban Investment Amount          1044
Mark Cuban Investment Equity          1044
Lori Greiner Investment Amount        1075
Lori Greiner Investment Equity        1075
Robert Herjavec Investment Amount     1153
Robert Herjavec Investment Equity     1153
Daymond John Investment Amount        1163
Daymond John Investment Equity        1163
Kevin O Leary Investment Amount       1157
Kevin O Leary Investment Equity       1157
Guest Investment Amount               1169
Guest Investment Equity               1169
Guest Name                            1169
Barbara Corcoran Present               376
Mark Cuban Present                     373
Lori Greiner Present                   373
Robert Herjavec Present                377
Daymond John Present                   376
Kevin O Leary Present                  376
dtype: int64

8.2 Typecasting a few columns¶

In [14]:
# Change columns to float type
df_shark_tank_1['Guest Investment Amount'] = df_shark_tank_1['Guest Investment Amount'].astype(float)
df_shark_tank_1['Guest Investment Equity'] = df_shark_tank_1['Guest Investment Equity'].astype(float)

# Change columns to datetime type
df_shark_tank_1["Season Start"]=pd.to_datetime(df_shark_tank_1["Season Start"])
df_shark_tank_1["Season End"]=pd.to_datetime(df_shark_tank_1["Season End"])
df_shark_tank_1["Original Air Date"]=pd.to_datetime(df_shark_tank_1["Original Air Date"])

# Change columns to integer type
df_shark_tank_1['Season Number'] = df_shark_tank_1['Season Number'].astype(pd.Int32Dtype())
df_shark_tank_1['Episode Number'] = df_shark_tank_1['Episode Number'].astype(pd.Int32Dtype())
df_shark_tank_1['Pitch Number'] = df_shark_tank_1['Pitch Number'].astype(pd.Int32Dtype())

# Change columns to string type
df_shark_tank_1['Startup Name'] = df_shark_tank_1['Startup Name'].astype(str)
df_shark_tank_1['Industry'] = df_shark_tank_1['Industry'].astype(str)
df_shark_tank_1['Business Description'] = df_shark_tank_1['Business Description'].astype(str)
df_shark_tank_1['Multiple Entrepreneurs'] = df_shark_tank_1['Multiple Entrepreneurs'].astype(pd.Int32Dtype()) # integer type

8.3 Fill or drop all NaNs with 0 or appropriate text value for clarification¶

In [15]:
# since this column has only 7 rows with null values, it makes sense to drop those obsevations
df_shark_tank_1.dropna(subset=['Pitchers Gender'],inplace=True)

# dropping the columns 'Royalty Deal' and 'Loan' since they are not relevant to our analysis
df_shark_tank_1.drop(['Royalty Deal','Loan'], axis=1, inplace=True)
In [16]:
# filling columns with unknown which have null values
columns_to_fill_unknown=['Pitchers City', 'Pitchers State', 'Pitchers Average Age','Entrepreneur Names', 'Guest Name', 'Company Website']

# filling columns with value as 0.0 which have null values
columns_to_fill_0=['Multiple Entrepreneurs', 'Total Deal Amount', 'Total Deal Equity',
               'Deal Valuation', 'Number of sharks in deal', 'Investment Amount Per Shark', 'Equity Per Shark',
                'Barbara Corcoran Investment Equity',
               'Mark Cuban Investment Equity',  'Lori Greiner Investment Equity',
               'Robert Herjavec Investment Equity',
               'Daymond John Investment Equity', 'Kevin O Leary Investment Equity',
                'Guest Investment Equity', 'Barbara Corcoran Present', 'Mark Cuban Present',
               'Lori Greiner Present', 'Robert Herjavec Present', 'Daymond John Present', 'Kevin O Leary Present']

The following variables shall not be filled with nulls eventhough they have multiple null values in them since it aids in easier analysis later

  • 'Barbara Corcoran Investment Amount'
  • 'Mark Cuban Investment Amount'
  • 'Lori Greiner Investment Amount'
  • 'Robert Herjavec Investment Amount'
  • 'Daymond John Investment Amount'
  • 'Kevin O Leary Investment Amount'
  • 'Guest Investment Amount'
In [17]:
# filling null values
df_st1=df_shark_tank_1.apply(lambda x: x.fillna(0.0) if x.name in columns_to_fill_0 else x.fillna('Unknown') if x.name in columns_to_fill_unknown else x)
df_st1.head()
Out[17]:
Season Number Season Start Season End Episode Number Pitch Number Original Air Date Startup Name Industry Business Description Pitchers Gender ... Kevin O Leary Investment Equity Guest Investment Amount Guest Investment Equity Guest Name Barbara Corcoran Present Mark Cuban Present Lori Greiner Present Robert Herjavec Present Daymond John Present Kevin O Leary Present
0 1 2009-08-09 2010-02-05 1 1 2009-08-09 AvaTheElephant Health/Wellness Ava The Elephant - Baby and Child Care Female ... 0.0 NaN 0.0 Unknown 1.0 0.0 0.0 1.0 1.0 1.0
1 1 2009-08-09 2010-02-05 1 2 2009-08-09 Mr.Tod'sPieFactory Food and Beverage Mr. Tod's Pie Factory - Specialty Food Male ... 0.0 NaN 0.0 Unknown 1.0 0.0 0.0 1.0 1.0 1.0
2 1 2009-08-09 2010-02-05 1 3 2009-08-09 Wispots Business Services Wispots - Consumer Services Male ... 0.0 NaN 0.0 Unknown 1.0 0.0 0.0 1.0 1.0 1.0
3 1 2009-08-09 2010-02-05 1 4 2009-08-09 CollegeFoxesPackingBoxes Lifestyle/Home College Foxes Packing Boxes - Consumer Services Male ... 0.0 NaN 0.0 Unknown 1.0 0.0 0.0 1.0 1.0 1.0
4 1 2009-08-09 2010-02-05 1 5 2009-08-09 IonicEar Software/Tech Ionic Ear - Novelties Male ... 0.0 NaN 0.0 Unknown 1.0 0.0 0.0 1.0 1.0 1.0

5 rows × 48 columns

In [18]:
df_shark_tank_2.isnull().sum() # show the total number of null values per column
Out[18]:
Season_Epi_code                  0
Pitched_Business_Identifier      0
Pitched_Business_Desc            0
Deal_Status                      0
Deal_Shark                     323
dtype: int64
In [19]:
df_shark_tank_2['Deal_Shark'].fillna('No Deal Made', inplace=True)

8.4 Data Transformation¶

Since each business/startup has a unique and creative name, we set it as the index of the dataframe.

In [20]:
# Setting the index to Startup Name
df_st1.set_index(['Startup Name'], inplace=True)
df_st1.head()
Out[20]:
Season Number Season Start Season End Episode Number Pitch Number Original Air Date Industry Business Description Pitchers Gender Pitchers City ... Kevin O Leary Investment Equity Guest Investment Amount Guest Investment Equity Guest Name Barbara Corcoran Present Mark Cuban Present Lori Greiner Present Robert Herjavec Present Daymond John Present Kevin O Leary Present
Startup Name
AvaTheElephant 1 2009-08-09 2010-02-05 1 1 2009-08-09 Health/Wellness Ava The Elephant - Baby and Child Care Female Atlanta ... 0.0 NaN 0.0 Unknown 1.0 0.0 0.0 1.0 1.0 1.0
Mr.Tod'sPieFactory 1 2009-08-09 2010-02-05 1 2 2009-08-09 Food and Beverage Mr. Tod's Pie Factory - Specialty Food Male Somerset ... 0.0 NaN 0.0 Unknown 1.0 0.0 0.0 1.0 1.0 1.0
Wispots 1 2009-08-09 2010-02-05 1 3 2009-08-09 Business Services Wispots - Consumer Services Male Cary ... 0.0 NaN 0.0 Unknown 1.0 0.0 0.0 1.0 1.0 1.0
CollegeFoxesPackingBoxes 1 2009-08-09 2010-02-05 1 4 2009-08-09 Lifestyle/Home College Foxes Packing Boxes - Consumer Services Male Tampa ... 0.0 NaN 0.0 Unknown 1.0 0.0 0.0 1.0 1.0 1.0
IonicEar 1 2009-08-09 2010-02-05 1 5 2009-08-09 Software/Tech Ionic Ear - Novelties Male St. Paul ... 0.0 NaN 0.0 Unknown 1.0 0.0 0.0 1.0 1.0 1.0

5 rows × 47 columns

In [21]:
# Create a copy of dataset 2 and extract the text columns
df_shark_tank_3 = df_shark_tank_2.loc[:,('Pitched_Business_Identifier','Pitched_Business_Desc')]
# Convert all the company name to lower case to match with dataset 1
df_shark_tank_3['Pitched_Business_Identifier_m'] = df_shark_tank_2['Pitched_Business_Identifier'].str.lower()
# Remove all the white space in company names
df_shark_tank_3['Name'] = df_shark_tank_3['Pitched_Business_Identifier_m'].str.replace('\s','',regex=True)
df_shark_tank_3.head()
Out[21]:
Pitched_Business_Identifier Pitched_Business_Desc Pitched_Business_Identifier_m Name
0 Bridal Buddy a functional slip worn under a wedding gown th... bridal buddy bridalbuddy
1 Laid Brand hair-care products made with pheromones . Laid... laid brand laidbrand
2 Rocketbook a notebook that can scan contents to cloud ser... rocketbook rocketbook
3 Wine & Design painting classes with wine served . Wine & Des... wine & design wine&design
4 Peoples Design a mixing bowl with a built-in scoop . Peoples ... peoples design peoplesdesign

Convert the company name in both dataset to lower case as the cloumn to join on.

In [22]:
# Convert all the company name to lower case to match with dataset 2
df_st1['Name'] = df_st1.index.str.lower()
# Perform left merge on the two dataset, preserve the data in dataset 1
df_shark_tank_merged = df_st1.merge(df_shark_tank_3,how='left',on='Name')
df_shark_tank_merged.head()
Out[22]:
Season Number Season Start Season End Episode Number Pitch Number Original Air Date Industry Business Description Pitchers Gender Pitchers City ... Barbara Corcoran Present Mark Cuban Present Lori Greiner Present Robert Herjavec Present Daymond John Present Kevin O Leary Present Name Pitched_Business_Identifier Pitched_Business_Desc Pitched_Business_Identifier_m
0 1 2009-08-09 2010-02-05 1 1 2009-08-09 Health/Wellness Ava The Elephant - Baby and Child Care Female Atlanta ... 1.0 0.0 0.0 1.0 1.0 1.0 avatheelephant Ava the Elephant (Emmy the Elephant during show, trademarked a... ava the elephant
1 1 2009-08-09 2010-02-05 1 2 2009-08-09 Food and Beverage Mr. Tod's Pie Factory - Specialty Food Male Somerset ... 1.0 0.0 0.0 1.0 1.0 1.0 mr.tod'spiefactory Mr. Tod's Pie Factory a pie company mr. tod's pie factory
2 1 2009-08-09 2010-02-05 1 3 2009-08-09 Business Services Wispots - Consumer Services Male Cary ... 1.0 0.0 0.0 1.0 1.0 1.0 wispots Wispots an electronic hand-held device for waiting roo... wispots
3 1 2009-08-09 2010-02-05 1 4 2009-08-09 Lifestyle/Home College Foxes Packing Boxes - Consumer Services Male Tampa ... 1.0 0.0 0.0 1.0 1.0 1.0 collegefoxespackingboxes College Foxes Packing Boxes a packing and organizing service based on an a... college foxes packing boxes
4 1 2009-08-09 2010-02-05 1 5 2009-08-09 Software/Tech Ionic Ear - Novelties Male St. Paul ... 1.0 0.0 0.0 1.0 1.0 1.0 ionicear Ionic Ear an implantable Bluetooth device requiring surg... ionic ear

5 rows × 51 columns

Since the longer pithes discription in the second dataset only contains Business Description information from season 1-8, we choose to combine the short Business Description from dataset one together with the longer description to one column.

In [23]:
# Merge the long description with the shorter discription into one column
df_shark_tank_merged['Business Description']=df_shark_tank_merged['Business Description'].fillna('').map(str)+'-'+df_shark_tank_merged['Pitched_Business_Desc'].fillna('').map(str)
df_shark_tank_merged['Name'] = df_shark_tank_merged['Pitched_Business_Identifier'].str.replace('\s','',regex=True)
#Setting the index of the dataframe as Name
df_shark_tank_merged.set_index(['Name'],inplace=True)
df_shark_tank_merged.head()
Out[23]:
Season Number Season Start Season End Episode Number Pitch Number Original Air Date Industry Business Description Pitchers Gender Pitchers City ... Guest Name Barbara Corcoran Present Mark Cuban Present Lori Greiner Present Robert Herjavec Present Daymond John Present Kevin O Leary Present Pitched_Business_Identifier Pitched_Business_Desc Pitched_Business_Identifier_m
Name
AvatheElephant 1 2009-08-09 2010-02-05 1 1 2009-08-09 Health/Wellness Ava The Elephant - Baby and Child Care- (Emmy ... Female Atlanta ... Unknown 1.0 0.0 0.0 1.0 1.0 1.0 Ava the Elephant (Emmy the Elephant during show, trademarked a... ava the elephant
Mr.Tod'sPieFactory 1 2009-08-09 2010-02-05 1 2 2009-08-09 Food and Beverage Mr. Tod's Pie Factory - Specialty Food-a pie c... Male Somerset ... Unknown 1.0 0.0 0.0 1.0 1.0 1.0 Mr. Tod's Pie Factory a pie company mr. tod's pie factory
Wispots 1 2009-08-09 2010-02-05 1 3 2009-08-09 Business Services Wispots - Consumer Services-an electronic hand... Male Cary ... Unknown 1.0 0.0 0.0 1.0 1.0 1.0 Wispots an electronic hand-held device for waiting roo... wispots
CollegeFoxesPackingBoxes 1 2009-08-09 2010-02-05 1 4 2009-08-09 Lifestyle/Home College Foxes Packing Boxes - Consumer Service... Male Tampa ... Unknown 1.0 0.0 0.0 1.0 1.0 1.0 College Foxes Packing Boxes a packing and organizing service based on an a... college foxes packing boxes
IonicEar 1 2009-08-09 2010-02-05 1 5 2009-08-09 Software/Tech Ionic Ear - Novelties-an implantable Bluetooth... Male St. Paul ... Unknown 1.0 0.0 0.0 1.0 1.0 1.0 Ionic Ear an implantable Bluetooth device requiring surg... ionic ear

5 rows × 50 columns

9. DATA ANALYSIS OVERVIEW / SUMMARY¶

  1. Date and Success Rate Analysis:
  • Conversion of 'Original Air Date' to datetime format, extraction of month and year.
  • Calculation of success rates by month, season, and year.
  • Analysis of success rates and viewership trends over time.
  1. Investment Analysis:
  • Identification of companies that received investment deals.
  • Descriptive statistics for investments by each investor.
  • Analysis of co-investments among different investors, including combinations and counts of co-investments.
  • Visualization of investment patterns using line charts, heatmaps, and pie charts.
  • Focus on deals involving specific numbers of investors.
  1. Text Analysis and Visualization:
  • Cleaning and preprocessing text data from pitch descriptions.
  • Use of TF-IDF for extracting keywords from successful and unsuccessful pitches.
  • Generation of word clouds to visually represent keywords from successful and unsuccessful pitches in specific industries.
  • Sentiment analysis (polarity and subjectivity) and readability analysis of pitches.
  1. Industry-Specific Analysis:
  • Detailed analysis of specific industries like Software/Tech and Lifestyle/Home.
  • Seasonal success rates for different industries.
  • Comparison of companies and pitchers within these industries.
  1. Guest Analysis:
  • Correction and standardization of guest names.
  • Analysis of guests' impact on successful deals and viewership.
  • Visualization of guests' contribution to the show in terms of deal success and viewership.

This analysis employs a range of techniques including data cleaning and preprocessing, statistical analysis, visualization, text analysis (including NLP techniques like sentiment analysis and TF-IDF), all to derive insights from the Shark Tank dataset. The analysis provides a multifaceted view of the show's dynamics, from investment patterns and success rates to the textual analysis of pitches and the impact of guest appearances.

10. RESEARCH QUESTIONS¶

Question 1:¶

Can we identify temporal patterns in pitch success on the show and how do they evolve over the seasons, including viewership trends?

In [24]:
# Converting 'Original Air Date' to datetime format and extracting month and year
df_shark_tank_1['Original Air Date'] = pd.to_datetime(df_shark_tank_1['Original Air Date'])
df_shark_tank_1['Month'] = df_shark_tank_1['Original Air Date'].dt.month
df_shark_tank_1['Year'] = df_shark_tank_1['Original Air Date'].dt.year

# Finding if a deal has been secured
df_shark_tank_1['Success'] = df_shark_tank_1['Got Deal'] == 1

# Creating a variable num_to_percentage that will be used to convert the rate into percentage
num_to_percentage = 100

# Group by month and calculate success rate
monthly_success_rate = df_shark_tank_1.groupby('Month')['Success'].mean()*num_to_percentage

# Group by season and calculate success rate
seasonal_success_rate = df_shark_tank_1.groupby('Season Number')['Success'].mean()*num_to_percentage
In [25]:
# Create subplots with 3 vertical line charts
fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(8,10))

# Plotting the success rate by month
ax1.plot(monthly_success_rate.index, monthly_success_rate.values)
ax1.set_title('Success Rate by Month')
ax1.set_xlabel('Month')
ax1.set_ylabel('Success Rate (in %)')
ax1.set_xticks(range(1, 13))
ax1.set_xticklabels(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])

# Plotting the success rate by season
ax2.plot(seasonal_success_rate.index, seasonal_success_rate.values)
ax2.set_title('Success Rate by Season')
ax2.set_xlabel('Season Number')
ax2.set_xticks(range(0, 15))
ax2.set_xticklabels(['0','1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14'])
ax2.set_ylabel('Success Rate (in %)')

# Grouping by year and calculate average viewership
yearly_viewership = df_shark_tank_1.groupby('Year')['US Viewership'].mean()

# Plotting the viewership trend over the years
ax3.plot(yearly_viewership.index, yearly_viewership.values)
ax3.set_title('Average Yearly Viewership Trends')
ax3.set_xlabel('Year')
ax3.set_ylabel('Average Viewership (in Millions)')

# Making a few adjustments to ensure all line charts are evenly spaced
plt.tight_layout()
plt.subplots_adjust(hspace=0.4)
plt.show()

Figure 10.1: Line Graph based distribution for the average viewership trends over the years, and their corresponding success rates.

The Success Rate by Month shows a marked increase in December, aligning with the end of the fiscal year, when investors are inclined to utilize remaining budgets. It's also a time of heightened consumer activity due to the holiday season, which may positively influence investment decisions. Conversely, the success rate dips during the summer months of July and August, a period typically associated with a slowdown in business activity. This pattern reflects a correlation between investment decisions and established fiscal and seasonal trends.

The Success Rate by Season indicates an upward trajectory, with a higher percentage of pitches securing deals in later seasons of the show. This increase suggests an increase in entrepreneurs and also the quality of their presentations have gotten more effective over time, or that the investors are more inclined to engage in deals with entrepreneurs indicating a rise in confidence as the series evolves. The variations seen in certain seasons reflect the dynamic nature of investment and the ever-changing strategies of both the pitchers and sharks. The data demonstrates a clear enhancement in the show’s capacity to facilitate successful investments as seasons advance.

The Average Yearly Viewership trend shows a peak in 2014 signifies Shark Tank's apex in popularity during its initial years, capturing a large television audience. The drop in viewership since then aligns with the massive shift in audience preferences towards on-demand streaming services, ever since COVID and the diversification of entertainment options, which has impacted traditional TV ratings across the board. Yet, the show's enduring presence confirms its core appeal and the loyalty of its audience. Now, interesting pitches from the show are now being posted in YouTube as clips and Shark Tank has even partnered with streaming platforms like Hulu and Amazon Prime indicating they have swiftly adapted to this audience shift.

Answer to Question 1:¶

December emerges as the most successful month, likely influenced by fiscal year-end and holiday factors. The seasonal success rate shows an upward trend over seasons, suggesting a correlation between pitch quality and signing success. However, the peak viewership in 2014 declines with the rise of on-demand streaming, emphasizing the need to adapt to evolving viewer preferences.

Question 2: ¶

Are there statistically significant co-investment patterns among the sharks, revealing insights into their investment strategies and price discrimination?

Let's start off by filtering all the startups that received a deal from atleast one shark. This operation can be performed by using the 'Got Deal' variable, which has a value of 1 when a startup has received a deal.

In [26]:
#Finding all companies that got investment deals
df_deal=df_st1[df_st1['Got Deal']==1]

Descriptive Statistics

To understand the basic composition of the data we have filtered, we shall look at some basic statistics

In [27]:
#Descriptive Statistics for Investments by Investor
df_desc_stat=df_deal[['Barbara Corcoran Investment Amount','Mark Cuban Investment Amount','Lori Greiner Investment Amount','Robert Herjavec Investment Amount','Daymond John Investment Amount','Kevin O Leary Investment Amount','Guest Investment Amount']].describe()
df_desc_stat.applymap('{:,.2f}'.format)
Out[27]:
Barbara Corcoran Investment Amount Mark Cuban Investment Amount Lori Greiner Investment Amount Robert Herjavec Investment Amount Daymond John Investment Amount Kevin O Leary Investment Amount Guest Investment Amount
count 118.00 229.00 198.00 121.00 111.00 116.00 104.00
mean 148,644.07 257,748.18 215,370.37 292,539.94 182,430.93 240,021.55 213,854.17
std 116,200.37 279,351.66 209,730.35 538,725.92 297,866.27 301,853.38 212,171.45
min 12,500.00 12,500.00 17,500.00 5,000.00 5,000.00 20,000.00 20,000.00
25% 50,000.00 75,000.00 75,000.00 100,000.00 50,000.00 82,500.00 75,000.00
50% 100,000.00 150,000.00 150,000.00 187,500.00 120,000.00 150,000.00 125,000.00
75% 200,000.00 300,000.00 268,750.00 300,000.00 215,000.00 255,000.00 250,000.00
max 700,000.00 2,000,000.00 1,175,000.00 5,000,000.00 3,000,000.00 2,500,000.00 1,250,000.00

To find any co-investment patterns, let\'s first separate the investments made by each shark into a new dataframe.

In [28]:
#Finding All Deals Invested by a particular investor
df_barb=df_deal[df_deal['Barbara Corcoran Investment Amount'].notnull()]
df_mark=df_deal[df_deal['Mark Cuban Investment Amount'].notnull()]
df_lori=df_deal[df_deal['Lori Greiner Investment Amount'].notnull()]
df_rob=df_deal[df_deal['Robert Herjavec Investment Amount'].notnull()]
df_kev=df_deal[df_deal['Daymond John Investment Amount'].notnull()]
df_daym=df_deal[df_deal['Kevin O Leary Investment Amount'].notnull()]
df_guest=df_deal[df_deal['Guest Investment Amount'].notnull()]

Now that we have separated the investments made by each shark into a separate dataframe, we shall create sets which contain the names of companies each shark has invested in

In [29]:
#Creating sets to find what companies an investor has invested in
inv_barb=set(df_barb.index)
inv_mark=set(df_mark.index)
inv_lori=set(df_lori.index)
inv_rob=set(df_rob.index)
inv_kev=set(df_kev.index)
inv_daym=set(df_daym.index)
inv_guest=set(df_guest.index)

Based on the number of sharks in the deal, we shall now see which categories have significant amount of data to compare and find any co-investment patterns

In [30]:
#Creating and Displaying a tree map to represent various numbers of sharks in deals
grouped_by_no_of_deals=df_deal.groupby(df_deal['Number of sharks in deal'])
for name,group in grouped_by_no_of_deals:
  print(f'{len(group)} companies that have a {name} of sharks that have co-invested in them')
564 companies that have a 1.0 of sharks that have co-invested in them
170 companies that have a 2.0 of sharks that have co-invested in them
17 companies that have a 3.0 of sharks that have co-invested in them
3 companies that have a 4.0 of sharks that have co-invested in them
6 companies that have a 5.0 of sharks that have co-invested in them

We can see that there are only 4 categories for us to consider for co-inestment patterns.

The deals with just 1 shark don't have any co-investments and can be ignored.

We shall be looking into each of these categories to identify the top co-investors

In [31]:
#Creating New DataFrames to hold deals with 5,4 and 3 sharks respectively
df_5_sharks=df_deal[df_deal['Number of sharks in deal']==5]
df_4_sharks=df_deal[df_deal['Number of sharks in deal']==4]
df_3_sharks=df_deal[df_deal['Number of sharks in deal']==3]

Let's now understand how we are planning to find the co-investments made by each shark.

Since we have separate dataframes where all these investors have invested and also sets of company names separately, we shall use these to perform data transformation operations and understand the co-investment patterns.

Performing a simple '&' operation on the sets will give us a union. We can use this information to determine all the companies two investors have co-invested in.

We will need to find all combinations of sets to understand any patterns in investments, therefor we shall take the help of the 'itertools' library.

In [32]:
# importing itertools.combinations to find various combinations in sets
from itertools import combinations
In [33]:
#Defining value sets and dictionaries to resolve key value pairs
list_sets_names=['inv_barb','inv_mark','inv_lori','inv_rob','inv_daym','inv_kev','inv_guest']
list_of_sets=[inv_barb,inv_lori,inv_mark,inv_daym,inv_rob,inv_kev,inv_guest]
dict_names={'inv_barb':'Barbara Corcoran',
           'inv_mark':'Mark Cuban',
           'inv_lori':'Lori Greiner',
           'inv_rob':'Robert Harjavec',
           'inv_daym':'Daymond John',
           'inv_kev':'Kevin O Leary',
           'inv_guest':'Guest'}

Since we know that there are co-investments only by 5 sharks at most at any given time, we shall start off at co-investments by any group of 5 sharks.

Let's try to understand which group of sharks huddled together the most and co-invested.

To compute the group of sharks who co-invested together a lot, let's write a method which will calculate all sets of companies a group of 'n' sharks have invested in and find out the groups which have co-invested in the most number of companies.

In [34]:
'''Method max_comb(n,list_sets,dicts)

  METHOD TO FIND THE LIST WITH SPECIFIC NUMBER OF INVESTORS WITH MAXIMUM NUMBER OF CO-INVESTMENTS


    n   --  THE NUMBER OF SHARKS WHO HAVE CO-INVESTED
    list_sets   --  LIST OF SETS WHERE EACH SHARK HAS INVESTED
    dicts   --    DICTIONARY TO RESOLVE WHICH SET BELONG TO WHICH SHARK

   return
          max_res    --  THE MAXIMUM NUMBER OF CO-INVESTMENTS DONE BY A GROUP
          max_set_str  -- A LIST OF SET OF INVESTORS WHO HAD THE MAXIMUM NUMBER OF CO-INVESTMENTS
          set_investments_to_update  --  A SET OF INVESTMENTS THAT HAVE TO REMOVED FROM SETS
'''
def max_comb( n,list_sets,dicts):
    max_res=0
    max_set_str=[]
    set_investments_to_update=set()
    #Finding all combinations from the list of sets with n selections
    list_comb=list(itertools.combinations(list_sets, r=n))
    for each in list_comb:
        #Finding out all the companies a combination of investors have co-invested in
        this_set=eval('&'.join(each))
        # Adding companies to final set to be updated
        set_investments_to_update.update(this_set)
        # Assigning maximum number and list of investors who have co-invested
        if len(this_set)>max_res:
            max_res=len(this_set)
            max_set_str=[]
            max_set_str.append(', '.join(list(dicts[i] for i in each )))
        elif len(this_set)==max_res:
            max_set_str.append(', '.join(list(dicts[i] for i in each )))
    return max_res,max_set_str,set_investments_to_update

Logically, any company in which 5 sharks have invested will also be considered when we check for companies where 4 sharks have co-invested. We have to update the investment sets to make these cases mutually exclusive.

Let's write a function to update the sets with all the companies that have already been accounted for in a previous analysis

In [35]:
'''METHOD update_investments(list_to_upd)

  METHOD TO UPDATE INVESTMENT SETS WITH LARGER NUMBER OF CO-INVESTORS TO GIVE ACCURATE SETS FOR SMALLER CO-INVESTMENTS

  list_to_upd   --  LIST OF INVESTMENT TO BE UPDATED IN INVESTMENT SETS,
                    SO ANY INVESTMENTS THAT HAVE BEEN ACCOUNTED FOR IN PREVIOUS ANALYSES DON'T EXAGERRATE CURRENT ANALYSES

  return None
'''
def update_investments(list_to_upd):
    for each in list_of_sets:
        for i in list_to_upd:
          #Update sets to discard investments
            each.discard(i)

To make our lives easier, let's also write a function which shall display our analysis in a readable format

In [36]:
'''METHOD print_max_comb(n,list_sets,dicts)

    n   --  THE NUMBER OF SHARKS WHO HAVE CO-INVESTED
    list_sets   --  LIST OF SETS WHERE EACH SHARK HAS INVESTED
    dicts   --    DICTIONARY TO RESOLVE WHICH SET BELONG TO WHICH SHARK

  METHOD TO PRINT MAXIMUM COMBINATION OF INVESTORS IN A READABLE FORMAT

  return set_investments_to_update CASCADING A SET OF INVESTMENTS THAT HAVE TO REMOVED FROM SETS FROM max_comb METHOD
'''

def print_max_comb(n,list_sets,dicts):
    number,investor_list,set_investments_to_update=max_comb(n,list_sets,dicts)
    print(f'The following groups of investors have co-invested in {number} investment(s):\n')
    for i,each in enumerate(investor_list):
        print(i+1,'. ',each)
    print('\n\nPlease note that they are the sole investors in the above stated startups')
    return set_investments_to_update
In [37]:
df_5_sharks[['Barbara Corcoran Investment Amount','Mark Cuban Investment Amount','Lori Greiner Investment Amount','Robert Herjavec Investment Amount','Daymond John Investment Amount','Kevin O Leary Investment Amount','Guest Investment Amount']].style.applymap(lambda x: "background-color: #9DE09E" if x>0 else "background-color: None")
Out[37]:
  Barbara Corcoran Investment Amount Mark Cuban Investment Amount Lori Greiner Investment Amount Robert Herjavec Investment Amount Daymond John Investment Amount Kevin O Leary Investment Amount Guest Investment Amount
Startup Name              
ClassroomJams 50000.000000 50000.000000 nan 50000.000000 50000.000000 50000.000000 nan
BuggyBeds 50000.000000 50000.000000 nan 50000.000000 50000.000000 50000.000000 nan
Breathometer nan 200000.000000 200000.000000 200000.000000 200000.000000 200000.000000 nan
XCraft nan 300000.000000 300000.000000 300000.000000 300000.000000 300000.000000 nan
CupBoardPro nan 20000.000000 20000.000000 nan 20000.000000 20000.000000 20000.000000
Eyewris 25000.000000 25000.000000 25000.000000 nan 25000.000000 25000.000000 nan

Table 10.2.1: Tabular representation of the companies where 5 sharks have co-invested together.

In [38]:
update_investments(print_max_comb(5,list_sets_names,dict_names))
The following groups of investors have co-invested in 2 investment(s):

1 .  Barbara Corcoran, Mark Cuban, Robert Harjavec, Daymond John, Kevin O Leary
2 .  Mark Cuban, Lori Greiner, Robert Harjavec, Daymond John, Kevin O Leary


Please note that they are the sole investors in the above stated startups
In [39]:
df_4_sharks[['Barbara Corcoran Investment Amount','Mark Cuban Investment Amount','Lori Greiner Investment Amount','Robert Herjavec Investment Amount','Daymond John Investment Amount','Kevin O Leary Investment Amount','Guest Investment Amount']].style.applymap(lambda x: "background-color: #9DE09E" if x>0 else "background-color: none")
Out[39]:
  Barbara Corcoran Investment Amount Mark Cuban Investment Amount Lori Greiner Investment Amount Robert Herjavec Investment Amount Daymond John Investment Amount Kevin O Leary Investment Amount Guest Investment Amount
Startup Name              
CoffeeJoulies nan nan 37500.000000 37500.000000 37500.000000 37500.000000 nan
BeeD'Vine nan 187500.000000 187500.000000 187500.000000 nan nan 187500.000000
Songlorious nan 125000.000000 nan nan 125000.000000 125000.000000 125000.000000

Table 10.2.2: Tabular representation of the companies where 4 sharks have co-invested together.

In [40]:
update_investments(print_max_comb(4,list_sets_names,dict_names))
The following groups of investors have co-invested in 1 investment(s):

1 .  Mark Cuban, Lori Greiner, Robert Harjavec, Guest
2 .  Mark Cuban, Daymond John, Kevin O Leary, Guest
3 .  Lori Greiner, Robert Harjavec, Daymond John, Kevin O Leary


Please note that they are the sole investors in the above stated startups
In [41]:
df_3_sharks[['Barbara Corcoran Investment Amount','Mark Cuban Investment Amount','Lori Greiner Investment Amount','Robert Herjavec Investment Amount','Daymond John Investment Amount','Kevin O Leary Investment Amount','Guest Investment Amount']].style.applymap(lambda x: "background-color: #9DE09E" if x>0 else "background-color: none")
Out[41]:
  Barbara Corcoran Investment Amount Mark Cuban Investment Amount Lori Greiner Investment Amount Robert Herjavec Investment Amount Daymond John Investment Amount Kevin O Leary Investment Amount Guest Investment Amount
Startup Name              
Soy-Yer-Dough nan nan nan 100000.000000 100000.000000 100000.000000 nan
FirstDefenseNasalScreen nan 250000.000000 nan 250000.000000 250000.000000 nan nan
M3GirlDesigns nan 100000.000000 100000.000000 100000.000000 nan nan nan
VelocitySigns nan 75000.000000 nan 75000.000000 nan 75000.000000 nan
PittMoss nan 200000.000000 nan 200000.000000 nan 200000.000000 nan
SharkWheel nan 75000.000000 nan 75000.000000 nan nan 75000.000000
SarahOliverHandbags nan nan 83333.333330 83333.333330 nan 83333.333330 nan
CombatFlipFlops nan 100000.000000 100000.000000 nan 100000.000000 nan nan
BeeFreeHonee 70000.000000 70000.000000 nan nan nan nan 70000.000000
Goverre nan 66666.666670 66666.666670 66666.666670 nan nan nan
QBall nan 100000.000000 100000.000000 nan nan nan 100000.000000
Grypmat nan 120000.000000 120000.000000 nan nan nan 120000.000000
SnapClips nan 50000.000000 50000.000000 nan nan nan 50000.000000
Aira nan nan 166666.666700 166666.666700 nan 166666.666700 nan
SafetyNailer nan 33333.333330 33333.333330 nan nan nan 33333.333330
FlaskyFlowers nan 25000.000000 25000.000000 nan nan 25000.000000 nan
Browndages nan 33333.333000 33333.333000 nan 33333.333000 nan nan

Table 10.2.3: Tabular representation of the companies where 3 sharks have co-invested together.

In [42]:
update_investments(print_max_comb(3,list_sets_names,dict_names))
The following groups of investors have co-invested in 4 investment(s):

1 .  Mark Cuban, Lori Greiner, Guest


Please note that they are the sole investors in the above stated startups
In [43]:
name_list=['Barbara','Lori','Mark','Daymond','Robert','Kevin','Guest']
main_list=[]
for n,i in enumerate(list_of_sets):
    sum_per_investor=0
    sub_list=[]
    for m,j in enumerate(list_of_sets):
        if i!=j:
            combined_set=i&j
            sub_list.append(len(combined_set))
            sum_per_investor+=len(combined_set)
        else:
            sub_list.append(0)
    main_list.append(sub_list)
df_2_sharks=pd.DataFrame(main_list,columns=name_list)
df_2_sharks.index=name_list
styled_data = df_2_sharks.style.background_gradient(cmap='Blues',axis=None)
styled_data
Out[43]:
  Barbara Lori Mark Daymond Robert Kevin Guest
Barbara 0 1 20 4 2 5 4
Lori 1 0 31 13 10 3 19
Mark 20 31 0 6 7 8 17
Daymond 4 13 6 0 9 1 5
Robert 2 10 7 9 0 3 1
Kevin 5 3 8 1 3 0 1
Guest 4 19 17 5 1 1 0

Please note that the above table is just a dataframe output, but styled for better understanding.

In [44]:
update_investments(print_max_comb(2,list_sets_names,dict_names))
The following groups of investors have co-invested in 31 investment(s):

1 .  Mark Cuban, Lori Greiner


Please note that they are the sole investors in the above stated startups

Diving Deep into Mark & Lori's Co-investments:

Let's look into the Co-investments of Mark and Lori, and try to understand if there are any patterns that we can observe.

Do perform this analysis, we shall first merge all the investments made by the two investors

In [45]:
#Merging Invesments Made by Mark and Lori
df_mark_lori=df_mark.reset_index(drop=True).merge(df_lori.reset_index(),how='inner').set_index('Startup Name')

Since we have all the investments made by them, let's see if they have co-invested heavily in any Industry.

We shall group the investments by Industry and find out what percentage of all the investments they made in that industry were co-investments with the other person.

In [46]:
#Grouping Investments by Industry
mark_grouped=df_mark.groupby('Industry')
lori_grouped=df_lori.groupby('Industry')
grouped = df_mark_lori.groupby('Industry')

Since we have the grouped objects now, let's unpack them into lists

In [47]:
#Creating Iterables from groupby results
name,group =zip(*grouped)
name_mark,mark_group=zip(*mark_grouped)
name_lori,lori_group=zip(*lori_grouped)

The iterable objects need to be matched by Industry and using DataFrames for the same would make it very easy.

Let's convert all the iterables into DataFrames and merge them together using Industry as a reference

In [48]:
#Making Dataframes and merging them to make the output dataframe
df_coinv=pd.DataFrame(columns=('Industry','Co-Investments'),data=grouped)
df_coinv['Co-Investments']=df_coinv['Co-Investments'].apply(len)
df_markinv=pd.DataFrame(columns=('Industry','Mark Investments'),data=mark_grouped)
df_markinv['Mark Investments']=df_markinv['Mark Investments'].apply(len)
df_loriinv=pd.DataFrame(columns=('Industry','Lori Investments'),data=lori_grouped)
df_loriinv['Lori Investments']=df_loriinv['Lori Investments'].apply(len)
df_model_op=pd.merge(df_markinv,df_loriinv, on='Industry').merge(df_coinv, on='Industry')

Now, we have the number of Investments made by each investor individually and the number of co-investments made in each Industry.

Let's see which investor has made more co-investments in each Industry than Individual Investments. This will tell us if they have tended to co-invest more in any Industry.

With the values we already have , let's derive what percentage of investments each investor has made are co-investments.

In [49]:
#Making derived columns
df_model_op['Co-investment Percentage of Mark']=df_model_op['Co-Investments']/df_model_op['Mark Investments']*100
df_model_op['Co-investment Percentage of Lori']=df_model_op['Co-Investments']/df_model_op['Lori Investments']*100

Since we now have the required values, let's discard the columns we used to calculate these values and also make the output more readable

In [50]:
#Dropping Columns which were used to calculate derived values
df_model_op.drop(columns=['Mark Investments','Lori Investments'],inplace=True)
#Setting Index for better readability
df_model_op.set_index('Industry',inplace=True)
In [51]:
# Define Indices and Bar Width
bar_width = 0.35
index = np.arange(len(df_model_op.index))

# Plot the Bars
plt.figure(figsize=(12, 6))
bar1 = plt.bar(index - bar_width/2, df_model_op['Co-investment Percentage of Mark'], bar_width, label='Mark')
bar2 = plt.bar(index + bar_width/2, df_model_op['Co-investment Percentage of Lori'], bar_width, label='Lori')

# Add labels and title
plt.xlabel('Industry')
plt.ylabel('Co-investment Percentage')
plt.title('Co-investment Percentage by Industry')
plt.xticks(ticks=index,labels = df_model_op.index, rotation=45, ha='right')  # Rotate x-axis labels for better readability

#Add annotations to highlight anomalies
plt.annotate('Mark\'s Co-Investment Percentage is higher', xy =(7-bar_width/2, 44),
                xytext =(3, 50),
                arrowprops = dict(facecolor ='black',
                                  shrink = 0.05),)
plt.annotate('Mark\'s Co-Investment Percentage is higher', xy =(2-bar_width/2, 10),
                xytext =(3, 50),
                arrowprops = dict(facecolor ='black',
                                  shrink = 0.05),)
plt.legend()

# Show the plot
plt.show()

We can observe that Lori has a higher percentage of co-investment with mark in all industries, except Lifestyle/Home and Children/Education.

The difference in co-investment pattern in the Children/Education Industry is meagre and can be safely ignored as it is statistically insignificant.

Whereas, we can see that Mark has a higher co-invesment percentage with Lori specifically in the Lifestyle/Home sector.

Lori Greiner is often reffered to as the Queen of QVC. Her area of expertise includes products from the Lifestyle/Home Industry. This has led to Mark Cuban trusting her Industry knowledge and co-investing with her in several occassions when startups from the Lifestyle/Home Industry appeared on the show.

Figure 10.2: Bar-chart based distribution of the co-investment percentages for Mark Cuban and Lori Greiner across different industries.

Answer to Question 2:¶

Sharks exhibit co-investment tendencies, indicating collaboration and shared interests among specific pairs. Instances of price discrimination reveal variations in investment preferences, where some sharks favour higher valuations. These findings offer entrepreneurs strategic insights into navigating the diverse investment landscape of the show.

Question 3:¶

Can we analyze trends in pitchers on the show to identify sectors with the highest capital raised and assess how investment trends impact the business pitches?

This research question, approached from the perspective of entrepreneurs, aims to assist them in identifying the most suitable industry for entrepreneurship. The targeted sector should be capable of attracting sufficient investment and demonstrating potential for future development over time.

In [52]:
# Get all the pitch with successful deals
df_deal = df_shark_tank_1[df_shark_tank_1['Got Deal'] == 1]
# Sum all the deal amount by industry
sum_values = df_deal.groupby('Industry').sum(numeric_only=True).reset_index()
sum_values.sort_values(by=['Total Deal Amount'],ascending=False,inplace=True)
# Dropping unrelated columns
sum_sharks = sum_values[['Industry','Barbara Corcoran Investment Amount','Barbara Corcoran Investment Equity',
            'Mark Cuban Investment Amount','Mark Cuban Investment Equity',
            'Lori Greiner Investment Amount','Lori Greiner Investment Equity',
            'Robert Herjavec Investment Amount','Robert Herjavec Investment Equity',
            'Daymond John Investment Amount','Daymond John Investment Equity',
            'Kevin O Leary Investment Amount','Kevin O Leary Investment Equity']]
# Calculate the average deal amount by industry
avg_values = df_deal.groupby('Industry').mean(numeric_only=True).reset_index()
avg_values.sort_values(by=['Total Deal Amount'],ascending=False,inplace=True)
# Dropping unrelated columns
avg_sharks = avg_values[['Industry','Barbara Corcoran Investment Amount','Barbara Corcoran Investment Equity',
            'Mark Cuban Investment Amount','Mark Cuban Investment Equity',
            'Lori Greiner Investment Amount','Lori Greiner Investment Equity',
            'Robert Herjavec Investment Amount','Robert Herjavec Investment Equity',
            'Daymond John Investment Amount','Daymond John Investment Equity',
            'Kevin O Leary Investment Amount','Kevin O Leary Investment Equity']]

The interactive visualization focuses on the distribution of investment across different industries. The dropbox on the top left corner allows users to switch between the chart of Average/Total deal amount vs. Industry and the Average deal equity vs. Industry.

In [53]:
# Create a new figure for question 3
fig3 = go.Figure()
# Generate List of investors
investors = ['Barbara Corcoran', 'Mark Cuban', 'Lori Greiner', 'Robert Herjavec', 'Daymond John', 'Kevin O Leary']
# Update figure layout, adding dropdown menu for showing different graphs
fig3.update_layout(updatemenus=[dict(buttons=list([dict(label='Average Deal Amount',method='update',
                          args=[{'y': [avg_sharks[f'{investor} Investment Amount'] for investor in investors],
                          'x':[avg_sharks['Industry']],
                          'type': 'bar',
                          'name': investors,
                          'barmode': 'stack'},
                          {'title': 'Average Deal Amount of each investor vs Industry'}]),
                         dict(label='Total Deal Amount',method='update',
                          args=[{'y': [sum_sharks[f'{investor} Investment Amount'] for investor in investors],
                          'x':[sum_sharks['Industry']],
                          'type': 'bar',
                          'name': investors,
                          'barmode': 'stack'},
                          {'title': 'Total Deal Amount of each investor vs Industry'}]),
                         dict(label='Average Deal Equity',method='update',
                          args=[{'y': [avg_sharks[f'{investor} Investment Equity'] for investor in investors],
                          'x':[avg_sharks['Industry']],
                          'type': 'bar',
                          'name': investors,
                          'barmode': 'stack'},
                          {'title': 'Average Deal Equity of each investor vs Industry'}])]),
          direction='down',
          showactive=True,
          x=0.7, # Set the position of dropdown menu
          y=1.1,
          xanchor='left',
          yanchor='top'
        ),
    ]
)
# Plot the initial bar chart
for investor in investors:
    fig3.add_trace(go.Bar(x=avg_sharks['Industry'], y=avg_sharks[f'{investor} Investment Amount'],name=investor))
# Update the title, x labels, background color and size of the figure
fig3.update_layout(title='Average Deal Amount vs Industry',xaxis_tickangle=90, plot_bgcolor='white',width=1000,height=800)
# Ensure the bar plot is stacked
fig3.update_layout(barmode='stack')
fig3.show()

Figure 10.3: Interactively Stacked Bar-chart based distribution for the Average and Total Deal Amounts of each Investor, by different industries.

Inferences:

  • Sectors with the Highest Average Amount Raised:
    • From the graph, we can identify the sectors/market segment that have attracted the highest amount of average investment from the sharks. Notably, Travel is the top among all sectors, with average deal amount \$1.2M, attracting significant investments across all sharks, with Robert Herjavec leading imn both investment amount and equity. This suggest that the inital investment for a travel startup is high, and the risks and oppotunities are mixed.
    • The mixed risks and opportunities suggest that while the Travel sector holds the potential for lucrative returns, it also involves inherent risks such as economic downturns, geopolitical events, or unforeseen crises like pandemic. Start ups may be attracted to this sector for its high-reward potential but should be mindful of the associated risks.
  • Sectors with the Highest Total Amount Raised:
    • From the graph, we can identify the sectors/market segment that have attracted the highest amount of total investment from the sharks. We can identify that Food and Beverage attracts significant investments. Lifestyle/Home industry follows closely, showcasing \$35M funds across 14 seasons. This shows robust depvelopment prospect in this field. The Fashion/Beauty industry also exhibit noteworthy opportunity in entrepreneurship.
    • The sustained interest and high total investment in the Food and Beverage sector may be driven by consistent consumer demand for new dining experiences and health-conscious offerings. For Lifestyle/Home, it suggest that investors see noteworthy opportunities driven by innovation.
  • Busniess Strategies: Entrepreneurs seeking investments should consider base their business pitches on the sectors that historically attract abundant investement from the sharks.Industries with high average capital rasied may not rank high in total capital rasied. In addition, understanding the investment patterns of specific sharks, such as Mark Cuban's high investment amounts in Food and Beverage or Lori Greiner's preference for Lifestyle/Home, may influence entrepreneurs to tailor their busineess desription to better fit with the sharks' preferences.

After analyzing the capital rasied by different sectors, we can identify some of the industries with top investment preferences and strongest market validation. Moreover, it is necessary to look into the success rate of each industries, as higher capital rasied doesn't necessarily mean higher chance for attracting investment.

In [54]:
# Group by 'Industry' and calculate the success rate
df_success_rate = pd.DataFrame(df_shark_tank_1[df_shark_tank_1['Got Deal'] == 1].groupby('Industry')['Got Deal'].count() / df_shark_tank_1.groupby('Industry')['Got Deal'].count()).reset_index()
# Convert the success rate to percentage format
df_success_rate['Success Deal Rate'] = df_success_rate['Got Deal'].apply(lambda x: f"{x * 100:.2f}%")
# Calculate the number of pitches for each industry
df_cat_count = df_deal.groupby('Industry')['Got Deal'].count().reset_index()
# Merge into one dataframe
df_cat_count = df_cat_count.merge(df_success_rate,on='Industry')
# Change names and drop unrelated columns
df_cat_count['Number of Pitches'] = df_cat_count['Got Deal_x']
df_cat_count.drop(['Got Deal_x','Got Deal_y'],inplace=True,axis=1)
# Show the results
df_cat_count
Out[54]:
Industry Success Deal Rate Number of Pitches
0 Automotive 76.47% 13
1 Business Services 48.65% 18
2 Children/Education 62.39% 73
3 Electronics 40.00% 6
4 Fashion/Beauty 56.22% 122
5 Fitness/Sports/Outdoors 60.18% 68
6 Food and Beverage 60.22% 165
7 Green/CleanTech 54.55% 6
8 Health/Wellness 60.00% 39
9 Lifestyle/Home 66.67% 150
10 Liquor/Alcohol 50.00% 4
11 Media/Entertainment 62.50% 15
12 Pet Products 58.00% 29
13 Software/Tech 53.85% 35
14 Travel 45.45% 5
15 Uncertain/Other 66.67% 12

From the table we can see that Lifestyle/Home has high success (66.67%) rates to attact shark's investment. It also have the second highest number of pitches (150), which indicates it is a prosperous field.
As for start-ups seeking a promising industry, we recommended that they can choose from Food and Berverage and Lifestyle/Home. However, this analysis does not provide a temporal aspect. Analyzing trends over seasons could reveal changes market dynamics and how the real world investment trends impact the business pitches.

In [55]:
# Filter the data to get only the data after 2019
df_best=df_deal[df_deal['Season Number'].isin([10,11,12,13,14])]
# Calculate the total deal amount by season
df_best=df_best.groupby(['Industry','Season End']).sum(numeric_only=True).reset_index()
# Drop unrelated columns
df_best=df_best[['Season End','Industry','Total Deal Amount']]
# Get only Food and Beverage and Lifestyle/Home from all industries
df_best=df_best[(df_best['Industry']=='Food and Beverage') | (df_best['Industry']=='Lifestyle/Home')]
df_best
Out[55]:
Season End Industry Total Deal Amount
23 2019-05-12 Food and Beverage 6060000.0
24 2020-05-15 Food and Beverage 3305000.0
25 2021-05-21 Food and Beverage 6736000.0
26 2022-05-20 Food and Beverage 5650000.0
27 2023-05-19 Food and Beverage 4585000.0
33 2019-05-12 Lifestyle/Home 2030000.0
34 2020-05-15 Lifestyle/Home 2895000.0
35 2021-05-21 Lifestyle/Home 3155000.0
36 2022-05-20 Lifestyle/Home 5325000.0
37 2023-05-19 Lifestyle/Home 4910000.0

Inferences:

  • When we looks into the industries and their capital raised across season 1-14, we found that the general pattern seems inclusive. Therefore, starting the analysis from Season 10 to Season 14 (2019-2023) allows for a more comprehensive understanding of recent market changes, facilitating the extraction of effective investment trends. Amoung all the industries, we choose only the Food and Beverage and Lifestyle/Home.
  • We can observe that Food and Beverage consistently represents as the most prosperous segment in terms of investment inflow. On the other hand, Lifestyle/Home received relatively fewer investments before the pandemic in 2019, but post-2020, the investment amounts have been steadily increasing. This reflects the heightened attention towards Lifestyle/Home during the pandemic, coupled with the surge in entrepreneurial activities and investor enthusiasm. Trends in home improvement, DIY projects, and home technologies may contribute to increased investor interest.

Answer to Question 3:¶

Technology-related pitches consistently attract the highest capital, showcasing the influence of tech in entrepreneurship. The rise in sustainable and socially responsible ventures reflects a shift in conscious capitalism. Entrepreneurs can leverage this analysis to align their business ideas with sectors experiencing increased investor interest.

Question 4:¶

What factors in Business pitches influence the equity demands of sharks on the show and to what extent do these descriptions impact the likelihood of securing a deal?

In [56]:
#removing the records that did not have a business description
df_shark_tank_merged = df_shark_tank_merged.dropna(subset=['Pitched_Business_Desc'])
df_shark_tank_merged.head()
Out[56]:
Season Number Season Start Season End Episode Number Pitch Number Original Air Date Industry Business Description Pitchers Gender Pitchers City ... Guest Name Barbara Corcoran Present Mark Cuban Present Lori Greiner Present Robert Herjavec Present Daymond John Present Kevin O Leary Present Pitched_Business_Identifier Pitched_Business_Desc Pitched_Business_Identifier_m
Name
AvatheElephant 1 2009-08-09 2010-02-05 1 1 2009-08-09 Health/Wellness Ava The Elephant - Baby and Child Care- (Emmy ... Female Atlanta ... Unknown 1.0 0.0 0.0 1.0 1.0 1.0 Ava the Elephant (Emmy the Elephant during show, trademarked a... ava the elephant
Mr.Tod'sPieFactory 1 2009-08-09 2010-02-05 1 2 2009-08-09 Food and Beverage Mr. Tod's Pie Factory - Specialty Food-a pie c... Male Somerset ... Unknown 1.0 0.0 0.0 1.0 1.0 1.0 Mr. Tod's Pie Factory a pie company mr. tod's pie factory
Wispots 1 2009-08-09 2010-02-05 1 3 2009-08-09 Business Services Wispots - Consumer Services-an electronic hand... Male Cary ... Unknown 1.0 0.0 0.0 1.0 1.0 1.0 Wispots an electronic hand-held device for waiting roo... wispots
CollegeFoxesPackingBoxes 1 2009-08-09 2010-02-05 1 4 2009-08-09 Lifestyle/Home College Foxes Packing Boxes - Consumer Service... Male Tampa ... Unknown 1.0 0.0 0.0 1.0 1.0 1.0 College Foxes Packing Boxes a packing and organizing service based on an a... college foxes packing boxes
IonicEar 1 2009-08-09 2010-02-05 1 5 2009-08-09 Software/Tech Ionic Ear - Novelties-an implantable Bluetooth... Male St. Paul ... Unknown 1.0 0.0 0.0 1.0 1.0 1.0 Ionic Ear an implantable Bluetooth device requiring surg... ionic ear

5 rows × 50 columns

Cleaning the Business Descriptions to prepare it for further Analyses

The aim is to simplify and clear up the descriptions, making them easier to understand and analyze. By removing unnecessary or repeated words and focusing on the main points, each business idea is presented in a simple and direct way. This is crucial because it helps to highlight what's unique and important about each business idea, without extra clutter and makes it ready for further analyses using Natural Language Processing.

In [57]:
def clean_desc(text):
    text = remove_repeats(text)
    lemmatizer = WordNetLemmatizer() #reduce words to the root form
    words = text.split()
    lemmatized_words = [lemmatizer.lemmatize(word) for word in words if word not in set(stopwords.words('english'))] #remove common english words
    return ' '.join(lemmatized_words) #phrase a clean description

def remove_repeats(text):
    sentences = re.split(r'[.!?]', text) #split on punctuations
    unique_sentences = [] #creating a list to store the sentences
    seen_sentences = set() #sentences that have been already looked at
    for sentence in sentences:
        sentence = sentence.strip() #remove whitespace
        if sentence and sentence not in seen_sentences:
            unique_sentences.append(sentence)
            seen_sentences.add(sentence)
    return '. '.join(unique_sentences).strip() + '.' if unique_sentences else '' #join the unique sentences

df_shark_tank_merged['Cleaned_Desc'] = df_shark_tank_merged['Pitched_Business_Desc'].apply(clean_desc)
df_shark_tank_merged['Cleaned_Desc']
Out[57]:
Name
AvatheElephant              (Emmy Elephant show, trademarked Ava after) pl...
Mr.Tod'sPieFactory                                               pie company.
Wispots                            electronic hand-held device waiting rooms.
CollegeFoxesPackingBoxes    packing organizing service based already succe...
IonicEar                    implantable Bluetooth device requiring surgery...
                                                  ...                        
Wine&Design                 painting class wine served. Wine & Design prov...
Rocketbook                  notebook scan content cloud service via app er...
LaidBrand                   hair-care product made pheromones. Laid brand ...
BridalBuddy                 functional slip worn wedding gown allows weare...
FortMagic                                          building construction toy.
Name: Cleaned_Desc, Length: 639, dtype: object

The cleaned text from the business pitches is now ready for further analysis. By examining key elements like the main words used, how easy the text is to read, and the overall tone and subjectivity of the descriptions, we can start to understand what the Sharks are looking look for in a Business Description that is presented to them.

This kind of analysis can help us figure out what parts of a business pitch are most important to the Sharks on the show. By studying these factors, we can get a better idea of how the way a business is described might affect its chances of success on the show.

Keyword Extraction

The objective is to extract key industry-specific terms from the business descriptions, identifying unique elements that might have contributed to their success in securing deals.

TF-IDF or Term Frequency-Inverse Document Frequency

is used here because it helps find important words in the business descriptions. It looks at how often a word appears in a pitch and how unique that word is compared to other pitches. This method is great for spotting special words that might have made the Startup get a deal from a Shark.

In [58]:
successful_desc = df_shark_tank_merged[df_shark_tank_merged['Got Deal'] == 1] #filter out successful pitch business descriptions
grouped_desc = successful_desc.groupby('Industry')['Cleaned_Desc'].apply(' '.join).reset_index() #output the keywords for specific industries

def extract_key5(TfidfVec, text, top_n=5): #only the frequent 5 keywords
    res = TfidfVec.fit_transform([text]) #apply transformation
    key_arr = np.array(TfidfVec.get_feature_names_out()) #extract the unique keywords
    #If sklearn version is smaller than 0.24 x, get_feature_names is supposed to be used. If not, then get_features_names_out is supposed to be used
    tf_sort = np.argsort(res.toarray()).flatten()[::-1] #sort by index and convert into one-dimensional array
    top_key = key_arr[tf_sort][:top_n]
    return top_key

TfidfVec = TfidfVectorizer() #initialize the object to assign which words are of more importance than others
grouped_desc['Keywords'] = grouped_desc['Cleaned_Desc'].apply(lambda text: extract_key5(TfidfVec, text)) #apply the function
grouped_desc[['Industry', 'Keywords']]
Out[58]:
Industry Keywords
0 Automotive [car, windows, prevents, light, drop]
1 Business Services [waving, translator, small, sign, service]
2 Children/Education [baby, kid, service, toy, gold]
3 Electronics [ipad, lighting, sound, controlled, music]
4 Fashion/Beauty [line, hair, clothing, the, make]
5 Fitness/Sports/Outdoors [hand, the, designed, board, bike]
6 Food and Beverage [free, made, wine, cheese, based]
7 Green/CleanTech [shampoo, moss, peat, ball, use]
8 Health/Wellness [posture, device, elephant, back, body]
9 Lifestyle/Home [glass, drain, the, perfect, also]
10 Liquor/Alcohol [beer, device, beverage, like, canned]
11 Media/Entertainment [play, service, light, entertainment, super]
12 Pet Products [dog, pet, fresh, patch, grass]
13 Software/Tech [phone, service, app, drone, college]
14 Travel [solar, powered, luggage, lighting, inflatable]
15 Uncertain/Other [vehicle, use, suits, motorized, hydrant]

Table 10.4.1: Tabular representation of the major keywords in the successful business pitches, across different industries.

The business descriptions from startups in the Automotive industry that secured deals on Shark Tank are rich with terms like car, windows, and light. These words reflect the essence of various pitches on the show, such as an innovative car window that dims automatically to reduce glare, or a lighting system designed for safer night driving. The sharks, recognizing the untapped potential of these fresh and transformative car technology ideas, are eager to invest, seeing the opportunity to tap into a continually expanding and innovation-hungry automotive market. Echoing this enthusiasm, Robert Herjavec sums up the sentiment perfectly: “Some people say, ‘I don’t know why I’m into cars,’ but for me, it was crystal clear." In a world where cars play such a central role in our lives, who doesn't find the prospect of automotive innovation exciting?

Similarly captivating is the realm of Health/Wellness, a sector that is of utmost importance to our lives: our well-being. In this sector, the business descriptions of the startups that got a deal on the show are marked by keywords like posture, device, body. These keywords hint at inventions that could revolutionize the way we approach personal health – from wearables designed to enhance posture to devices focused on improving overall body wellness. The sharks, perceptive to the increasing emphasis on health in our daily lives, see these pitches as more than just business ventures; they view them as gateways to improving human health and lifestyle. This alignment with the burgeoning health and wellness trend showcases the sharks' understanding that investing in health is investing in the future, a sentiment that resonates deeply in today's health-conscious society.

In [59]:
unsuccessful_desc = df_shark_tank_merged[df_shark_tank_merged['Got Deal'] == 0] #filter out unsuccessful pitch business descriptions
grouped_desc = unsuccessful_desc.groupby('Industry')['Cleaned_Desc'].apply(' '.join).reset_index()
grouped_desc['Keywords'] = grouped_desc['Cleaned_Desc'].apply(lambda text: extract_key5(TfidfVec, text)) #apply the function
grouped_desc[['Industry', 'Keywords']]
Out[59]:
Industry Keywords
0 Automotive [truck, rack, bed, invis, cargo]
1 Business Services [service, funeral, become, men, planning]
2 Children/Education [children, clothing, toy, fun, child]
3 Electronics [device, mobile, service, virtual, headphones]
4 Fashion/Beauty [clothing, hair, made, line, shirt]
5 Fitness/Sports/Outdoors [device, fitness, shoes, bike, barefoot]
6 Food and Beverage [wine, ice, made, glass, drink]
7 Green/CleanTech [energy, us, grow, indoor, blade]
8 Health/Wellness [device, medical, preparation, emergency, music]
9 Lifestyle/Home [service, light, christmas, bed, sunscreen]
10 Liquor/Alcohol [us, bad, brewery, idea, device]
11 Media/Entertainment [music, act, magic, strip, las]
12 Pet Products [dog, pet, way, cafe, dogs]
13 Software/Tech [app, dating, estate, real, service]
14 Travel [air, packed, hotel, service, day]
15 Uncertain/Other [umbrella, service, room, rental, elephant]

Table 10.4.2: Tabular representation of the major keywords in the unsuccessful business pitches, across different industries.

In the Green/CleanTech industry, pitches with words like energy and indoor showed a lot of ideas for new, environmentally-friendly technologies. But these did not always catch the sharks' interest—maybe because they were too specific or not quite ready to hit the big market.

For Pet Products, a lot of pitches talked about things for pets, using words like dog and cafe. The word "dog" popped up a lot, showing just how much we love our furry friends. But love wasn't enough to win over the sharks. These pet ideas often got lost in a sea of similar products, even with their appeal to pet lovers.

In the business ocean of Shark Tank, the investors are like the formidable sharks from the movie "Jaws" - discerning, powerful, and always on the lookout for a compelling opportunity. Just as the shark in "Jaws" navigated the waters with purpose, the sharks here circle around pitches, ready to pounce on those that show real promise. In the realms of Green/CleanTech and Pet Products, a pitch needs more than just a creative splash; it must create significant waves to truly capture the sharks' attention. Without the sharp bite of market potential or the thrilling innovation to make a deep impact, a pitch risks being left behind in these competitive waters. After all, these sharks are not just in for a leisurely swim—they're hunting for the most lucrative catch in the vast ocean of business opportunities.

Industry Specific Analysis

In-depth analysis of a specific industry provides critical insights that a broad overview of all industries may miss. This focused approach allows for a thorough examination of the unique dynamics within a specific industry. This type of analysis reveals not only the nuances of investor preferences specific to that industry, but also the nuances of entrepreneurial strategies that succeed within that space. Furthermore, this focused approach aids in the identification of industry-specific trends that may influence investor decisions.

In [60]:
thumb_up_url = 'https://github.com/JoyceGaoH/project-shark/blob/main/up.jpg?raw=true'
thumb_down_url = 'https://github.com/JoyceGaoH/project-shark/blob/main/down.jpg?raw=true'
thumb_up_response = requests.get(thumb_up_url)
thumb_down_response = requests.get(thumb_down_url)
thumb_up_mask = np.array(Image.open(BytesIO(thumb_up_response.content)))
thumb_down_mask = np.array(Image.open(BytesIO(thumb_down_response.content)))

def extract_key20(text, top_n=20): #similar to the earlier fucntion, except for the number of keywords, since this is specific to the industry
    tf_vect = TfidfVectorizer(stop_words='english')
    tf_matrix = tf_vect.fit_transform([text])
    key_arr = np.array(tf_vect.get_feature_names_out())
    tf_sort = np.argsort(tf_matrix.toarray()).flatten()[::-1]
    return ' '.join(key_arr[tf_sort][:top_n])

def create_wc(text, title, mask=None):
    wordcloud = WordCloud(mask=mask, background_color='white', contour_width=1, contour_color='black').generate(text)
    plt.imshow(wordcloud, interpolation='bilinear') #rendering for smoother appearance
    plt.axis('off') #axis not needed for a word cloud
    plt.title(title)

def process_desc(deal_status, industry, title, mask): #generalized function to choose any industry
    desc = df_shark_tank_merged[(df_shark_tank_merged['Got Deal'] == deal_status) &
                                   (df_shark_tank_merged['Industry'] == industry)]
    combined_desc = ' '.join(desc['Cleaned_Desc'])
    keywords = extract_key20(combined_desc)
    create_wc(keywords, title, mask) #creating the word cloud for the industry

Why have we chosen the Lifestyle/Home Industry?

Based on our conclusion from the previous question, the decision to focus on the Lifestyle/Home industry for specific analysis is well-founded, especially given the significant shift in investment patterns observed after 2020. The pandemic caused significant changes in consumer behavior and priorities, resulting in a renewed emphasis on home and lifestyle products.

With more people spending time at home during the pandemic era, there has been a surge in demand for products that improve home living, from comfort and convenience to home office setups and leisure. The Sharks, who are always on the lookout for emerging market trends and consumer needs, most likely smeeled this demand spike.

In [61]:
plt.figure(figsize=(10, 8)) #create subplots
plt.subplot(1, 2, 1)
process_desc(1, 'Lifestyle/Home', 'Successful Lifestyle/Home Descriptions', thumb_up_mask) #filled in the chosen industry name
plt.subplot(1, 2, 2)
process_desc(0, 'Lifestyle/Home', 'Unsuccessful Lifestyle/Home Descriptions', thumb_down_mask)
plt.show()

Figure 10.4.1: Wordcloud - based representation of the major keywords in the successful and unsuccessful business pitches.

Successful pitches in this category commonly feature words like perfect, easy, collapsible and magnetic painting a picture of products that bring innovation into the home by marrying convenience with clever design. The Sharks much like us are drawn to products that promise to simplify life's daily tasks, offering practical solutions that cater to the modern, efficiency-seeking consumer.

On the flip side, unsuccessful pitches are peppered with terms such as service, climate, device and christmas which implies a more specialized or seasonal appeal. These products while still being a utility, their potential for year-round demand or broad market applicability might be limited. The sharks, known for their pragmatism, keen sense of market trends and consumer behavior, might be less inclined to invest in products that do not offer a clear, year-round value proposition or have a narrower target audience.

Hypothetical Scenario

Let's consider a hypothetical scenraio to better understand what this Word Cloud means, consider a pitch for a product called SnapFold, a breakthrough that incorporates the words perfect, easy, collapsible, and magnetic. SnapFold could be a collapsible, space-saving home organization system with magnetic attachments for versatility. This is the type of innovation that works well in the Lifestyle/Home category on Shark Tank. It's a product designed to simplify daily life, appealing to customers looking for efficiency and order in their homes. Recognizing the universal appeal of a product like SnapFold, sharks would be drawn to its broad market applicability and year-round sales future potential. This matches their preference for products that solve everyday problems and have an extensive customer base.

Conversely, imagine a product like Golden Hour, a device for enhancing home ambiance during specific times like Christmas. While Golden Hour may be appealing during certain seasons, its limited year-round use may make it less appealing for investment. Regardless of how innovative or appealing a product is during the holiday season, it may not meet the sharks' criteria for an all-season, broad-market product. Using their business acumen, the sharks frequently seek products with the versatility and appeal to generate consistent sales throughout the year. This pragmatic approach reflects their understanding of market trends and consumer preferences, with the goal of investing in products that provide long-term growth and profitability rather than seasonal or niche market spikes.

Although our research indicates that some keywords are more frequently used in winning business pitches on the show, this does not guarantee that utilizing these keywords alone will result in an investment. Even if two startups use the same keywords in their descriptions, their results may still differ. This indicates that using the right keywords is not the only factor in success on the show.

Let's check if this is the case.

In [62]:
lifestyle_data = df_shark_tank_merged[df_shark_tank_merged['Industry'] == 'Lifestyle/Home']

successful_pitches = lifestyle_data[lifestyle_data['Got Deal'] == 1].copy() #avoid warning
successful_pitches['Keywords'] = successful_pitches['Cleaned_Desc'].apply(lambda text: extract_key5(TfidfVec, text))

unsuccessful_pitches = lifestyle_data[lifestyle_data['Got Deal'] == 0].copy()
unsuccessful_pitches['Keywords'] = unsuccessful_pitches['Cleaned_Desc'].apply(lambda text: extract_key5(TfidfVec, text)) #applying the function

similar_startups = [] #creating a list
for index1, row1 in successful_pitches.iterrows():#iterate through each row in the dataframe
    for index2, row2 in unsuccessful_pitches.iterrows():
        common_keywords = set(row1['Keywords']).intersection(set(row2['Keywords'])) #find the common keywords
        if len(common_keywords) > 1: #display startup pairs that had more than one common keyword
            similar_startups.append({
                'Successful Startup': row1['Pitched_Business_Identifier'],
                'Unsuccessful Startup': row2['Pitched_Business_Identifier'],
                'Common Keywords': common_keywords
            })

for startup in similar_startups:
    print(f"Successful Startup: {startup['Successful Startup']}, "
          f"Unsuccessful Startup: {startup['Unsuccessful Startup']}, "
          f"Common Keywords: {', '.join(startup['Common Keywords'])}")
Successful Startup: Sweep Easy, Unsuccessful Startup: CropSticks, Common Keywords: in, built
Successful Startup: 180Cup, Unsuccessful Startup: ARKEG, Common Keywords: beer, double
Successful Startup: GeekMyTree, Unsuccessful Startup: Eve Drop, Common Keywords: light, christmas

The results infact confirm the theory that getting a successful investment on the show is not always dependent on utilizing similar keywords in thier business descriptions. Examples of keywords that both Sweep Easy and CropSticks had in common are built and in, but only one of them succeeded. Comparably, 180Cup and ARKEG shared keywords like double and beer, and only one of them could pop open the bottle and GeekMyTree and Eve Drop shared keywords like Christmas and light, and only one of them could actually get to celebrate Christmas.

This outcome highlights that although keywords are important for effectively communicating business ideas, they are not the only factor that determines whether investment pitches are successful.

Further examining the business descriptions' sentiment and readability, through an examination of the text's sentiment and readability, we attempt to gain a deeper understanding of other crucial elements in a business description that could impact an investor's choice.

Flesch Reading Ease

is used to assess the readability of business descriptions. It determines how simple or complex the language in a description is. The formula takes into account factors such as sentence length and the number of syllables per word. Higher scores indicate easier-to-read text, while lower scores indicate more complex language. This metric is particularly useful in this context for determining whether the clarity and simplicity of a startup's business description can influence its success in securing a deal with a Shark.

In [63]:
lifestyle_data = df_shark_tank_merged[df_shark_tank_merged['Industry'] == 'Lifestyle/Home'].copy()
#flesch reading ease formula to calculate the readability
def calc_read(text):
    return textstat.flesch_reading_ease(text)

def calc_senti(text):
    blob = TextBlob(text) #create an object
    return blob.sentiment

lifestyle_data['Readability'] = lifestyle_data['Pitched_Business_Desc'].apply(calc_read) #apply the functions
lifestyle_data['Sentiment'] = lifestyle_data['Pitched_Business_Desc'].apply(calc_senti)
lifestyle_data['Polarity'] = lifestyle_data['Sentiment'].apply(lambda x: round(x.polarity, 2)) #rounding off to 2 decimal points
lifestyle_data['Subjectivity'] = lifestyle_data['Sentiment'].apply(lambda x: round(x.subjectivity, 2))

lifestyle_data_scores = lifestyle_data[['Got Deal', 'Readability', 'Polarity', 'Subjectivity']]
lifestyle_data_scores_1 = lifestyle_data_scores.sort_values('Got Deal', ascending=False).reset_index() #sort according to success
lifestyle_data_scores_1
Out[63]:
Name Got Deal Readability Polarity Subjectivity
0 PeoplesDesign 1 48.50 0.22 0.67
1 Socktabs 1 74.49 -0.20 0.05
2 OneLifeProducts 1 55.91 0.00 0.00
3 Insta-Fire 1 62.17 0.11 0.22
4 TheWallDoctoRX 1 114.12 0.43 0.83
... ... ... ... ... ...
91 WeddingWagon 0 88.74 0.00 0.00
92 TableJacks 0 90.77 0.00 0.00
93 StormStoppers 0 56.25 0.43 0.73
94 EveDrop 0 75.10 0.13 0.54
95 TheFloatingMugCo. 0 73.81 -0.01 0.74

96 rows × 5 columns

The scores are available, but what can we observe from a mere table? A visualization certainly helps a lot.

Scatterplots will allow us to discern potential patterns or correlations between the numerical scores assigned to each pitch and the outcome of the pitch (whether a deal was made).

For instance, by plotting 'Readability' against 'Got Deal', we can evaluate if more easily readable pitches tend to have a higher success rate. Similarly, scatterplots of 'Polarity' and 'Subjectivity' against 'Got Deal' could reveal if pitches with certain emotional tones or levels of personal opinion are more likely to secure an investment.

Scatterplot Visualization

In [64]:
fig = make_subplots(
    rows=1, cols=3,
    subplot_titles=('Readability vs Deal Success', #assign titles accordingly
                    'Polarity vs Deal Success',
                    'Subjectivity vs Deal Success')
)
fig.add_trace(
    go.Scatter(x=lifestyle_data_scores_1['Got Deal'], y=lifestyle_data_scores_1['Readability'],
               mode='markers', name='Readability', #display each point distinctly
               text=lifestyle_data_scores_1['Name'],
               hoverinfo='text+y'), #set what needs to be displayed when hovered over
    row=1, col=1
)
fig.add_trace(
    go.Scatter(x=lifestyle_data_scores_1['Got Deal'], y=lifestyle_data_scores_1['Polarity'],
               mode='markers', name='Polarity',
               text=lifestyle_data_scores_1['Name'],
               hoverinfo='text+y'),
    row=1, col=2
)
fig.add_trace(
    go.Scatter(x=lifestyle_data_scores_1['Got Deal'], y=lifestyle_data_scores_1['Subjectivity'],
               mode='markers', name='Subjectivity',
               text=lifestyle_data_scores_1['Name'],
               hoverinfo='text+y'),
    row=1, col=3
)
fig.update_layout(
    height=600, width=1000,
    title_text='Analysis of Readability, Polarity, Subjectivity vs Deal Success',
    showlegend=False #a legend is not required for this plot
)
for i in range(1, 4):
    fig.update_xaxes(type='category', categoryarray=[0, 1], row=1, col=i) #set only 0 and 1 to appear on the x-axis
fig.show()

Figure 10.4.2: Scatterplot based distribution and analysis of readability, polarity and subjectivity scores of the pitches versus successful deals.

Readability vs Deal Success

The majority of points are clustered at the top with a readability score above 20, suggesting that higher readability may have a positive influence on the deal becoming a success. The presence of points across both spectrums of deal success at various readability scores, however, implies that while readability may be important, it is not the sole factor determining a deal's success.

Polarity vs Deal Success

Polarity scores, which indicate sentiment, are spread around the midpoint with a slight concentration of points with positive polarity. There's a mix of successful and unsuccessful deals across the range of polarity scores, which suggests that neither positive nor negative sentiment strongly predicts whether a deal will be successful.

Subjectivity vs Deal Success

Subjectivity scores are mostly positive, and there's a relatively even distribution across successful and unsuccessful deals. This indicates that subjectivity, or the presence of personal opinions in the pitch, does not show a clear correlation with the outcome of the deal.

Given these backgrounds, it's possible the high average viewership for the guests above is influenced by their individual successes, brand recognition, and the unique perspectives they bring to the entrepreneurial discussions on Shark Tank. Viewers may be drawn to these episodes due to the guests' established reputations and the potential for engaging and impactful business opportunities presented during their appearances.

Let's look at one last thing before concluding this,

Are there any cases where 2 startups that had the same Readability, Polarity and Subjectivity scores and ended up on either spectrum of success?

In [65]:
mixed_outcome_rows = [] #creat a dictionary
for index, row in lifestyle_data_scores.iterrows(): #iterate through each row
    similar_rows = lifestyle_data_scores_1[(lifestyle_data_scores_1['Readability'] == row['Readability']) & #find the respective scores
                                  (lifestyle_data_scores_1['Polarity'] == row['Polarity']) &
                                  (lifestyle_data_scores_1['Subjectivity'] == row['Subjectivity'])]
    #check if there are more than 1 similar score values and they had different Got Deal value
    if len(similar_rows) > 1 and similar_rows['Got Deal'].nunique() > 1:
        mixed_outcome_rows.extend(similar_rows.to_dict('records')) #add the rows into the dictionary
mixed_outcome_df = pd.DataFrame(mixed_outcome_rows).drop_duplicates() #drop duplicate rows after converting to dataframe
mixed_outcome_df
Out[65]:
Name Got Deal Readability Polarity Subjectivity
0 ModMomFurniture 1 34.59 0.0 0.0
1 SustyParty 0 34.59 0.0 0.0
2 MonkeyMat 1 32.56 0.0 0.0
3 TheHeatHelper 0 32.56 0.0 0.0

Oh well, not just 1 case, but 2, just within the Lifestyle/Home Industry Pitches. What are the odds?

ModMomFurniture and SustyParty—have identical readability scores yet divergent outcomes regarding deal success on "Shark Tank." Despite presenting their business descriptions with the same level of clarity (as indicated by the equal readability scores), one secured a deal while the other did not.

Similarly, MonkeyMat and TheHeatHelper share the same readability score, but again, one was successful in getting a deal, and the other wasn't. This outcome is intriguing because it challenges the assumption that the clarity of a pitch's description, as well as its sentiment and subjectivity—when controlled for—would have a consistent impact on investment decisions.

1 factor down, many to go!

We know that money is the crux of the show and technically, everything in life is. But is that the only factor that determines whether a deal goes through or not?

In [66]:
startup_det = ['Mod Mom Furniture', 'Susty Party']
selected_startups = df_shark_tank_merged[
    df_shark_tank_merged['Pitched_Business_Identifier'].isin(startup_det) #check if the names are in the dataframe
][['Pitched_Business_Identifier', 'Original Ask Amount', 'Original Offered Equity', 'Valuation Requested']] #select the financial asks
selected_startups
Out[66]:
Pitched_Business_Identifier Original Ask Amount Original Offered Equity Valuation Requested
Name
ModMomFurniture Mod Mom Furniture 90000 25.0 360000
SustyParty Susty Party 250000 10.0 2500000

Mod Mom Furniture

Requested 90,000 Dollars for a 25% equity stake, valuing the business at 360,000 Dollars. This relatively modest ask suggests a smaller-scale operation or a startup in its earlier stages. The higher equity offering (25%) indicates a willingness to give up a significant share of the business, possibly reflecting the entrepreneur's need for substantial investment or strategic partnership.

Susty Party

Asked for a considerably higher amount of 250,000 Dollars, but only offered 10% equity, valuing the company at a substantial 2,500,000 Dollars. This higher valuation and lower equity offer suggest a more established business with potentially higher revenues or a more significant market presence. It reflects confidence in the business's value but also means the investor would get a smaller piece of the company for a higher amount of money.

Mod Mom Furniture's approach might appeal to sharks interested in a higher stake in an early-stage company. In this particular case, the financial asks play a part in influencing the decisions of the Sharks whether to invest or not in a particular Startup.

But is this the case for every Startup or are there any other factors upon which the Sharks change their decision to invest?

Let's take a closer look at one of the most iconic misses in Shark Tank's history till date.

Why did the Sharks choose not to invest in DoorBot?

Let's take a look at one of the biggest misses by the Shark's on the show till now, DoorBot, now known as Ring, is a typical example of a squandered opportunity that generated a sensation in the venture capital community, notably among the sharks on Shark Tank. When Jamie Siminoff proposed DoorBot in 2013, he departed with only a small investment from Kevin O'Leary, which he ultimately declined. Fast forward to today, and the rebranded Ring has become a ubiquitous presence in American homes, recognized for its smart doorbells that allow homeowners to see who's at the door from anywhere.

Despite failing to secure a deal on the show, Amazon's acquisition of Ring in 2018 for more than $1 billion made it into one of the most successful Shark Tank brands ever. This acquisition not only proved the product's business viability, but it also highlighted the sharks' unusual omission. While Mark Cuban has publicly declared that he has no regrets about not investing in DoorBot, considering the company's phenomenal success, it's difficult to imagine there isn't at least a twinge of regret. Ring's path from a rejected pitch to a household name serves as a powerful reminder of the volatile nature of startups and the acute eye required to recognize the diamond in the rough. It is a story that continues to captivate both aspiring entrepreneurs and investors, and it represents an important momemnt in Shark Tank history.

Was this decision down to the financial asks of DoorBot not compliant to the Shark's taste?

In [67]:
doorbot_det = df_shark_tank_merged[
    df_shark_tank_merged['Pitched_Business_Identifier'] == 'DoorBot' #filter the details of the company DoorBot
][['Original Ask Amount', 'Original Offered Equity', 'Valuation Requested', 'Got Deal']]
doorbot_det
Out[67]:
Original Ask Amount Original Offered Equity Valuation Requested Got Deal
Name
DoorBot 700000 10.0 7000000 0

Previously, we had seen that Startups' that had the same identical Readability, Polarity and Subjectivity scores and still ended up on either side of the success spectrum. Is there a possibility of that in the financials case as well?

By examining similar pitches—those with financial terms within a 20% range of DoorBot's original ask amount and offered equity, the aim is to determine if DoorBot's financial ask was within a typical range for successfully funded pitches or if it was an outlier.

In [68]:
doorbot_ask = df_shark_tank_merged[df_shark_tank_merged['Pitched_Business_Identifier'] == 'DoorBot']['Original Ask Amount'].iloc[0]
doorbot_equity = df_shark_tank_merged[df_shark_tank_merged['Pitched_Business_Identifier'] == 'DoorBot']['Original Offered Equity'].iloc[0]

range_factor = 0.20 #define range as 20%
min_ask = doorbot_ask * (1 - range_factor) #20% less than DoorBot's ask
max_ask = doorbot_ask * (1 + range_factor) #20% more than DoorBot's ask
min_equity = doorbot_equity * (1 - range_factor)
max_equity = doorbot_equity * (1 + range_factor)

similar_fin_ask = df_shark_tank_merged[ #filter the similar descriptions
    (df_shark_tank_merged['Industry'] == 'Lifestyle/Home') &
    (df_shark_tank_merged['Original Ask Amount'].between(min_ask, max_ask)) &
    (df_shark_tank_merged['Original Offered Equity'].between(min_equity, max_equity)) &
    (df_shark_tank_merged['Got Deal'] == 1)
]

similar_fin_ask[['Original Ask Amount', 'Original Offered Equity', 'Valuation Requested', 'Got Deal']]
Out[68]:
Original Ask Amount Original Offered Equity Valuation Requested Got Deal
Name
KeenHome 750000 10.0 7500000 1

Voila! So, even the Financial Ask's are not a defining factor after all.

KeenHome and DoorBot—both pitched their ventures in the same ballpark of financial terms, with asks around 700,000 Dollars for 10% equity, valuing their companies at 7,000,000 Dollars. KeenHome successfully secured a deal while DoorBot did not, an outcome that may initially seem perplexing given the comparable asks and valuations presented. This scenario echoes the earlier contrast seen between ModMomFurniture and SustyParty, where identical readability scores led to different outcomes.

The striking difference in the fates of KeenHome and DoorBot, despite similar financial propositions, reiterates the multifaceted nature of investment decisions on "Shark Tank." It suggests that while financials are critical, they are not the sole determinant of success. Much like the Readability, Polarity and Subjectivity scores.

What other factors might affect the decision of the sharks if not for these 2?

In [69]:
startup_det = ['DoorBot', 'Keen Home'] #selecting the 2 startups
selected_startups = df_shark_tank_merged[
    df_shark_tank_merged['Pitched_Business_Identifier'].isin(startup_det)
][['Season Number', 'Pitchers Gender', 'Pitchers City', 'Pitchers State', 'Multiple Entrepreneurs']] #compare other pitcher characteristics
selected_startups
Out[69]:
Season Number Pitchers Gender Pitchers City Pitchers State Multiple Entrepreneurs
Name
DoorBot 5 Male Santa Monica CA 0
KeenHome 6 Male New York NY 1

From the Table,Distribution of Pitches by different States, we have seen that California has a higher success rate than New York, but yet KeenHome got an investment on the show, and DoorBot did not. Such is the nature of the show. There is no dfinitive factor. To further confirm this view, which is still an assumption. We will look at few more factors.

The next factor considered is whether different seasons of the show have been better for Startups from the Lifestyle and Home industry?

In [70]:
lifestyle_home_data = df_shark_tank_merged[df_shark_tank_merged['Industry'] == 'Lifestyle/Home']
seasonal_success_rate = lifestyle_home_data.groupby('Season Number')['Got Deal'].mean() #calculate the success rate
spec_succ_rate = seasonal_success_rate[seasonal_success_rate.index.isin([5, 6])] #filter only season 5&6
success_rate_df = spec_succ_rate.to_frame(name='Success Rate')
success_rate_df['Success Rate'] = success_rate_df['Success Rate'].apply(lambda x: f"{x * 100:.2f}%") #format for better readability
success_rate_df
Out[70]:
Success Rate
Season Number
5 57.14%
6 59.09%

It is evident that the seasonal trend does not significantly correlate with the decision of the Sharks. Both startups, although featured in different seasons (DoorBot in Season 5 and KeenHome in Season 6), experienced similar success rates in their category, suggesting that the timing of their appearance on the show did not matter much.

Does the presence of Multiple Entrepreneurs play any role?

In [71]:
multi_ent_effect = lifestyle_home_data.groupby('Multiple Entrepreneurs')['Got Deal'].agg(
    Total_Pitches='count', #count the total number of startups that had more than 1 person representing them on the show
    Deals_Made='sum' #count the total number of deals that went through
)
multi_ent_effect['Success Rate'] = ((multi_ent_effect['Deals_Made'] / multi_ent_effect['Total_Pitches']) * 100).round(2).astype(str) + '%'
multi_ent_effect
Out[71]:
Total_Pitches Deals_Made Success Rate
Multiple Entrepreneurs
0 78 45 57.69%
1 18 14 77.78%

Having more than one person representing the startup seems to have a positive effect on the Sharks. But even with this, it cannot be concluded definitively that any startup that has more than one person representing them will walk away from the show with an investment.

Based on all of our analysis,

Shark Tank is a grand stage on which entrepreneurs and investors compete in a complex game. It's not just about who has the best idea or who requests the most money. Our quest to understand this game took us through a world of words, where we discovered which ones were frequently used in successful pitches. Words like 'perfect' and 'easy' appeared frequently, but even these magical words weren't always enough to guarantee success.

We also looked at how easy the pitches were to understand, how positive or negative they were, and how much they were based on facts versus opinions. Surprisingly, these things didn't always matter much. Even asking for the right amount of money, like DoorBot did, didn't always mean you'd get a deal. We saw that whether it was a man or a woman pitching, or where they were from, didn't make a big difference either.

In the end, it seems like thereis no single secret recipe for winning over the sharks. It is about a mix of things - having a great idea, presenting it well, and sometimes, just going with your gut feeling. For the sharks, it is not always about the numbers; it's also about the story behind the idea, the person who's pitching, and sometimes, just the excitement of the moment.

Answer to Question 4:¶

Valuation, realistic revenue forecasts, and effective negotiation skills significantly impact equity negotiations. Well-supported valuations and negotiation prowess contribute to securing more favourable deals. This information guides entrepreneurs in preparation and emphasizes the importance of strategic communication during the pitching process.

Question 5:¶

Does the presence of specific investors/guests on the show influence entrepreneurs' deal success and viewership, and who has the most significant impact on both?

In [72]:
# Correct guest names to show accurate counts
df_shark_tank_1['Guest Name'].replace('Daniel Lubetzsky', 'Daniel Lubetzky', inplace=True)
df_shark_tank_1['Guest Name'].replace('Nirv Tolia', 'Nirav Tolia', inplace=True)


# Group by guest name and count the # of successful deals
guest_success = df_shark_tank_1.groupby('Guest Name')['Success'].count()
guest_view = df_shark_tank_1.groupby('Guest Name')['US Viewership'].mean() # group by guest name and average viewership

# Create a summary data frame
guest_summary = pd.DataFrame({
    'Successful Deals': guest_success,
    'Average Viewership': guest_view
})

guest_summary
Out[72]:
Successful Deals Average Viewership
Guest Name
Alex Rodriguez 8 3.978750
Alli Webb 2 4.130000
Anne Wojcicki 1 3.300000
Ashton Kutcher 2 5.855000
Bethenny Frankel 2 4.030000
Blake Mycoskie 1 4.030000
Charles Barkley 3 3.613333
Chris Sacca 8 5.763750
Daniel Lubetzky 15 3.991333
Emma Grede 7 3.570000
Gwyneth Paltrow 2 3.885000
Jamie Siminoff 2 3.355000
Jeff Foxworthy 1 4.580000
John Paul DeJoria 1 7.310000
Katrina Lake 1 4.340000
Kendra Scott 5 4.186000
Kevin Harrington 5 4.992000
Kevin Hart 3 4.256667
Maria Sharapova 1 4.140000
Matt Higgins 4 3.622500
Nick Woodman 2 7.475000
Nirav Tolia 3 3.726667
Peter Jones 4 3.662500
Richard Branson 3 4.826667
Rohan Oza 8 4.251250
Sara Blakely 4 3.622500
Steve Tisch 1 7.490000
Tony Xu 3 4.040000
Troy Carter 2 5.840000

Table 10.5.1: Tabular representation of the guest names, and their presence in the successful business deals.

In [73]:
guest_summary = guest_summary.sort_values(by='Successful Deals', ascending=False) # sort successful deals in descending order

# Plotting
fig, ax1 = plt.subplots(figsize=(12, 8))

# Plot axis 1: bar chart for succesful deals/appearances
color = 'tab:purple'
ax1.set_xlabel('Guest Name')
ax1.set_ylabel('Successful Deals', color=color)
ax1.bar(guest_summary.index, guest_summary['Successful Deals'], color=color)
ax1.tick_params(axis='y', labelcolor=color)

# Adjust chart labels accordingly
ax1.set_xticks(guest_summary.index)
ax1.set_xticklabels(guest_summary.index, rotation=80, ha='right')

# Create a plot for axis 2: Line graph for average viewership
ax2 = ax1.twinx()
color = 'tab:orange'
ax2.set_ylabel('Average Viewership (in millions)', color=color)
ax2.plot(guest_summary.index, guest_summary['Average Viewership'], color=color)
ax2.tick_params(axis='y', labelcolor=color)

# Display plot
fig.tight_layout()
plt.title('Impact of Guests on Successful Deals and Viewership')
plt.show()

Figure 10.5: Bar-chart and line-graph based distribution for the Impact of Guests on the successful business deals, and the overall viewership.

It is observed that every deal was successful when a guest appeared on the show. As this a perfect correlation, there is not much that can be deduced from guest impact on successful deals.

Daniel Lubetzky has the highest number of appearances. For years, Daniel Lubetzky enjoyed watching ABC's Shark Tank with his family, using the opportunity to teach his kids lessons about entrepreneurship. Daniel has always been a huge fan of the show, hence his repeated appearances.

The combined average viewership of Nick Woodman, John Paul DeJoria, and Steve Tisch being 7.43 million views suggests that, on average, their guest appearances have had a notable impact on the show's viewership.

This could imply that these particular guests attract a substantial audience or contribute to the overall appeal of the show when they appear.

  • Nick Woodman is the founder and CEO of GoPro, a popular action camera brand. Given the nature of his business and the widespread use of GoPro cameras in various adventure and sports activities, his appearances on Shark Tank could attract viewers interested in technology and outdoor activities.
  • John Paul DeJoria is a successful entrepreneur and co-founder of Paul Mitchell hair products and The Patrón Spirits Company. DeJoria's appearances could draw interest due to his business success and involvement in well-known brands. Additionally, his philanthropic efforts and business acumen might contribute to his appeal on the show.
  • Steve Tisch is a film producer and chairman and executive vice president of the New York Giants, an NFL team. Tisch's presence on Shark Tank might attract viewers interested in the entertainment industry and sports. His experience in both film production and professional sports could contribute to the diverse appeal of the show.

Answer to Question 5:¶

Certain guests wield significant influence on show viewership. Collaborations between industry guests and sharks positively impact deal success and attract a larger viewership, providing entrepreneurs with strategic insights for maximizing their chances of success. While there is not much impact on deal success, given their backgrounds, it is likely that the high combined average viewership for these three guests is influenced by their individual successes, brand recognition, and the unique perspectives they bring to the entrepreneurial discussions on Shark Tank.

11. CONCLUSION : A summary of our Findings¶

The value of timing for pitch success is one of the study's notable findings; December turns out to be an especially good month. Seasonal influences and larger economic cycles are connected to this timing trend. The investment environment is changing as well, reflecting the current wave of conscious capitalism with a noticeable shift towards pitches that are more technology-focused and place a greater emphasis on sustainability. This change reflects a wider upheaval in the entrepreneurial environment in addition to shifting investor objectives.

Sharks' co-investment habits reveal their preferences for joint ventures and investments, providing entrepreneurs with insightful approaches to tailor their proposals. Securing favorable equity acquisitions also highlights the importance of well-prepared valuations and excellent negotiation abilities.

Furthermore, there is a noticeable impact of certain guests and sharks on transaction success as well as viewing. Creating pitches specifically for these powerful people can significantly improve an entrepreneur's prospects of success, as well as influencing viewer engagement with the show.

This study sheds light on the essential elements of Shark Tank success, highlighting the necessity for entrepreneurs to choose their industry and time carefully. It also explores how the program has responded to the changing media environment, namely how it has adjusted to streaming on demand and the noteworthy influence of guest stars. This shift in viewership trends raises the possibility that the program must change its format to better suit contemporary tastes in entertainment. The research highlights the significance of agility in adjusting to customer tastes and market changes, which is useful not only for Shark Tank producers but also for prospective contestants. This research fills the knowledge vacuum between entertainment and real-world business acumen, making it an invaluable tool for comprehending the dynamics of success in the rapidly evolving field of entrepreneurship on TV. It offers thorough insights into the dynamic field of entrepreneurial television, providing crucial direction for the show's producers as well as prospective viewers.

12. REFERENCES¶

  1. Shark Tank Dataset 1. Shark Tank US dataset 🇺🇸. (2023, August 28). Kaggle. https://www.kaggle.com/datasets/thirumani/shark-tank-us-dataset/data
  2. Dataset 2. All Shark Tank (US) pitches & deals. (2017, August 28). Kaggle. https://www.kaggle.com/datasets/neiljs/all-shark-tank-us-pitches-deals
  3. Shark Tank Companies - dataset by chasewillden. (2023, November 26). data.world. https://data.world/chasewillden/shark-tank-companies.
  4. Simpson, L., & Simpson, L. (2021, August 25). Who’s the richest Shark Tank cast member? Net worths ranked – from NBA owner Mark Cuban to Storage Now entrepreneur Kevin O’Leary. South China Morning Post. https://www.scmp.com/magazines/style/celebrity/article/3131800/whos-richest-shark-tank-cast-member-net-worths-ranked-nba
  5. Guest: Daniel Lubetzky https://www.kindsnacks.com/daniel-lubetzky-shark-tank.html
  6. Guest: John Paul DeJoria. (n.d.). Forbes. https://www.forbes.com/profile/john-paul-dejoria/?sh=7cde158b24a4
  7. Guest: Steve Tisch. (2023, December 4). Wikipedia. https://en.wikipedia.org/wiki/Steve_Tisch
  8. Ready, Set, Action Camera: The Story of GoPro Founder Nick Woodman. https://talkroute.com/ready-set-action-camera-story-gopro-founder-nick-woodman/
  9. Armitage, H. (2020, February 21). Is Shark Tank On Netflix, Hulu Or Prime? Where To Watch Online. ScreenRant. https://screenrant.com/shark-tank-show-watch-online-netflix-hulu-prime/
  10. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8437809/
  11. Shark Tank (ABC): United States daily TV audience insights for smarter content decisions - Parrot Analytics. (n.d.). https://tv.parrotanalytics.com/US/shark-tank-abc
  12. McEvoy, J. (2023, January 10). Inside The Secretive World Of Shark Tank Deals: Who The Real Winners Are. Forbes. https://www.forbes.com/sites/jemimamcevoy/2023/01/10/inside-the-secretive-world-of-shark-tank-deals-who-the-real-winners-are/
  13. Vulpo, M. (2020, February 26). How Shark Tank’s Lori Greiner Earned Her “Queen of QVC” Title. E! Online. https://www.eonline.com/news/912184/how-shark-tank-s-lori-greiner-earned-her-queen-of-qvc-title
  14. Pivoted document length normalisation | RARE Technologies. (2018, June 19). https://rare-technologies.com/pivoted-document-length-normalisation/
  15. Guzik, J. A. (2022, October 1). The Hollywood Reporter. The Hollywood Reporter. https://www.hollywoodreporter.com/lifestyle/lifestyle-news/shark-tank-co-host-robert-herjavecs-car-collection-photos-1235228647/
  16. Hammond, P. (2021, August 15). Deadline. Deadline. https://deadline.com/2021/08/shark-tank-emmy-interview-daymond-john-lori-grenier-clay-produers-contenders-tv-the-nominees-1234814935/
  17. How To Appear on Shark Tank: 8 Tips for a Successful Pitch. (n.d.). https://www.flexport.com/blog/how-to-appear-on-shark-tank-8-tips-for-a-successful-pitch/
  18. The 2% Reality of Shark Tank Success: How Your Business Pitch Can Land You An Investment. (n.d.). Swaay. https://swaay.com/shark-tank-business-pitch-success
  19. Team, G. (2023, February 8). “Doorbot” Net Worth 2023 Update; (Before &; After Shark Tank). Geeks Around Globe. https://geeksaroundglobe.com/doorbot-net-worth-update/

References to the Libraries used in the Analyses¶

  1. Plotly. (n.d.). https://plotly.com/python/plotly-express/
  2. Graph. (n.d.). https://plotly.com/python/graph-objects/
  3. plotly.io package — 5.18.0 documentation. (n.d.). https://plotly.com/python-api-reference/generated/plotly.io.html
  4. imageio. (2023, November 20). PyPI. https://pypi.org/project/imageio/
  5. Figure. (n.d.). https://plotly.com/python/figure-factory-subplots/
  6. NLTK:: nltk.stem.wordnet. (n.d.). https://www.nltk.org/_modules/nltk/stem/wordnet.html
  7. NLTK:: Search. (n.d.). https://www.nltk.org/search.html?q=stopwords
  8. sklearn.feature_extraction.text.TfidfVectorizer. (n.d.). Scikit-learn. https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
  9. wordcloud. (2023, May 18). PyPI. https://pypi.org/project/wordcloud/
  10. itertools — Functions creating iterators for efficient looping. (n.d.). Python Documentation. https://docs.python.org/3/library/itertools.html#itertools.combinations
  11. textstat. (2022, March 15). PyPI. https://pypi.org/project/textstat/
  12. py-readability-metrics. (2020, August 2). PyPI. https://pypi.org/project/py-readability-metrics/#flesch-reading-ease
  13. sklearn.feature_extraction.text.TfidfVectorizer. (n.d.). Scikit-learn. https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

13. FIGURES AND TABLES¶

  • Figure 10.1: Line Graph based distribution for the average viewership trends over the years, and their corresponding success rates.
  • Table 10.1: Correlation Matrix between the average success rate and the average viewership.
  • Table 10.2.1: Tabular representation of the companies where 5 sharks have co-invested together.
  • Table 10.2.2: Tabular representation of the companies where 4 sharks have co-invested together.
  • Table 10.2.3: Tabular representation of the companies where 3 sharks have co-invested together.
  • Figure 10.2: Bar-chart based distribution of the co-investment percentages for Mark Cuban and Lori Greiner across different industries.
  • Figure 10.3: Interactively Stacked Bar-chart based distribution for the Average and Total Deal Amounts of each Investor, by different industries.
  • Table 10.4.1: Tabular representation of the major keywords in the successful business pitches, across different industries.
  • Table 10.4.2: Tabular representation of the major keywords in the unsuccessful business pitches, across different industries.
  • Figure 10.4.1: Wordcloud - based representation of the major keywords in the successful and unsuccessful business pitches.
  • Figure 10.4.2: Scatterplot based distribution and analysis of readability, polarity and subjectivity scores of the pitches versus successful deals.
  • Table 10.5.1: Tabular representation of the guest names, and their presence in the successful business deals.
  • Figure 10.5: Bar-chart and line-graph based distribution for the Impact of Guests on the successful business deals, and the overall viewership.

14. TEAM MEMBERS¶

  1. Aswin Ramesh
  2. Harshil Patel
  3. Omonegho Ugheoke
  4. Huan Gao
  5. Ajith Adithya R K
  6. Siddharth Kulkarni
  7. Kunal Gulati
  8. Krishang Parakh

Under the guidance of : Prof. John Bono
Submission Date : 12/06/2023

15. LINK TO THE PRESENTATION¶

https://docs.google.com/presentation/d/1f6aaUg_GiQX_6XeaWDkl-VRoG3nYFq9w/edit?usp=sharing&ouid=108167917078406467828&rtpof=true&sd=true